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Brief Reports 


The Journal of Consulting Psychology will 
accept Brief Reports of research studies in 
clinical psychology for early publication with- 
out expense to the author. The procedure is 
intended to permit the publication of soundly 
designed studies of specialized interest or lim- 
ited importance which cannot now be ac- 
cepted because of lack of space. Several pages 
in each issue will be devoted to Brief Reports, 
published in the order of their receipt with- 
out respect to the dates of receipt of the regu- 
lar articles. Most Brief Reports appear in the 
first or second issue to go to press following 
their final acceptance. 


An author who wishes to submit a Brief 
Report: 


1. Sends the Brief Report, limited to one printed 
page and prepared according to the specifications 
given below. 

2. Also sends to the Editor a full report of the re- 
search study, in sufficient detail to give a clear ac- 
count of its background, procedure, results, and con- 
clusions, which will be filed with the American 
Documentation Institute to insure indefinite avail- 
ability. 

3. Prepares at least 100 mimeographed copies of 
the full report, which the author will send without 
charge to all who request it as long as the supply 
lasts. 


4. Agrees not to submit the full report to another 
journal of general circulation. 


Specifications 
Brief Report. The Brief Report should give 
a clear, condensed summary of the procedure 


of the study and as full an account of the re- 
sults as space permits. 


To insure that the Brief Report will be no 
longer than one printed page, its typescript, 
including all matter except the title and the 
author’s lines, must not exceed 75 lines av- 
eraging 42 characters and spaces in length. 
Set the typewriter margins for short lines of 
42 characters, which are 3.5 inches long in 
elite typing, and 4.2 inches long in pica. 

The manuscript of the Brief Report must 
be double spaced throughout. Except for its 
short lines, it follows the standard style (1). 
Headings, tables, and references are avoided 
or, if essential, must be counted in the 75 
lines. Each Brief Report must be accom- 
panied by a footnote in the style below, 
which is typed on a separate sheet and not 
counted in the 75-line quota: * 


1An extended report of this study may be ob- 
tained without charge from John Doe, 300 Market 
St., Prospect 6, Mass. (giving the author’s full name 
and address), or for a fee from the American Docu- 
mentation Institute. Order Document No. , Te- 
mitting $—— for microfilm or $—— for photo- 
copies. 





Extended report. The full report is pre- 
pared in the style specified by the Publica- 
tion Manual (1), except that it may be typed 
with single spacing for economy in photo- 
duplication by the ADI. 


Reference 


1, American Psvchological Association. Council of 
Editors. Publication manual of the American 
Psychological Association (1957 rev.). Wash- 
ington, D. C.: American Psychological Asso- 
ciation, 1957. 
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Length of Therapy in Relation to Counselor Estimates 
of Personal Integration and Other Case Variables’ 


Stanley W. Standal and Ferdinand van der Veen 
University of Chicago 


The relationship between case length and 
the quality or extent of psychotherapeutic 
change in client-centered therapy is virtually 
unknown. In a study of 23 client-centered 
therapy cases, Seeman (18) found a trend in 
favor of higher success ratings by the thera- 
pist for longer cases. The shorter cases 
spanned the entire range of success ratings 
from complete failure, 1, to marked success, 
9, whereas the success ratings for longer cases 
clustered about two high points on the scale 
(points 7 and 8). The variability of ratings 
was significantly lower for the long case 
group. Seeman concluded that further con- 
firmation of these findings could mean that if 
a client is in therapy for at least twenty in- 
terviews, there is a strong assurance of gain 
from therapy, as judged by the counselor. 

In an analysis of 78 client-centered cases, 
Cartwright (4) found support for the trend 
noted by Seeman. In addition, he found that 
rated success as a function of number of in- 
terviews displayed two curvilinear compo- 
nents, one for short cases and one for longer 
cases. For 44 cases having less than 19 inter- 
views, the correlation ratio (eta) of success 
rating on number of interviews was .66, which 
was significant at better than the .01 level. 
For the 42 cases with 14 or more interviews 
the correlation ratio was .67, which also was 
significant at better than the .01 level of 
confidence. 

In view of the clinical as well as theoreti- 
cal importance which generally attaches to 
case length, simpler and/or more definitive 


1 This work was supported in part by a research 
grant (PHS M 903) from the National Institute of 
Mental Health, of the National Institutes of Health, 
United States Public Health Service. 





relationships between this variable and meas- 
ures of therapeutic process and outcome might 
be expected. In the present investigation, esti- 
mated changes in personality integration, life 
adjustment, and other case variables will be 
studied as additional factors possibly related 
to case length. 

The several case variables under considera- 
tion are derived from Seeman’s case rating 
scale (18) which comprises ten items designed 
to assess various aspects of the process and 
outcome of client-centered therapy.” All items 
are rated on a scale from 1 to 9. The first 
eight items require a beginning of therapy 
and end of therapy rating and are as follows: 


Item 1. The degree to which therapy was an intel- 
lectual-cognitive process for the client. Little or none 
(1) to maximally or exclusively (9). 

Item 2. The degree to which therapy was an emo- 
tional-experiential process for the client. Little or 
none (1) to maximally or exclusively (9). 

Item 3. The degree to which the client perceived 
therapy as a process of personal exploration or as 
specific analysis of life-situations. Situational (7) to 
personal exploration (9). 

Item 4. The degree to which the client used the re- 
lationship itself as a focus for therapy. Negligible 
extent (1) to maximally (9). 

Item 5. Estimate of the client’s attitude toward 
you during the course of therapy. Strong dislike (7) 
to strong liking or respect (9). 

Item 6. Estimate of your feelings toward the client. 
Strong dislike (1) to strong liking or respect (9). 

Item 7. The degree of personal integration of the 
client. Highly disorganized or defensively organized 
(1) to optimally integrated (9). 

Item 8. The life adjustment of the client. Low (1) 
to high (9). 


The last two items require only an end of 
therapy rating: 


2The scale was developed jointly by Drs. Julius 
Seeman and Nathaniel J. Raskin. 
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Item 9. The degree of satisfaction of the client 
with the outcome of therapy. Strongly dissatisfied 
(1) to extremely satisfied (9). 

Item 10. Your rating of the outcome of therapy. 
Complete failure (1) to marked success (9). 


A clinical appraisal of the various items 
suggests that the variable most likely to be 
related to case length is personal integration. 
As described in Item 7 it implies personality 
reorganization, which seems to be regarded 
by psychotherapists of most persuasions as a 
process which is both long and gradual. 
Client-centered therapists have tended to be 
less concerned with case length, but the writ- 
ings of theorists in this approach undoubtedly 
convey the impression that personality reor- 
ganization is an extensive and gradual proc- 
ess. For example, Rogers wrote: 


The best definition of what constitutes integration 
appears to be this statement that all the sensory and 
visceral experiences are admissible to awareness 
through accurate symbolization, and organizable 
into one system which is internally consistent and 
which is, or is related to, the structure of self (15, 
pp. 513-514). 


And in describing the process of becoming 
integrated: 


Exploration of experience is made possible by the 
counselor, and since the self is accepted at every 
step of its exploration and in any change it may 
exhibit, it seems possible gradually to explore areas 
at a “safe” rate, and hitherto denied experiences are 
slowly and tentatively accepted . . . (15, p. 518). 

Gradually he [the client] comes to experience the 
fact that he is making value judgments .. . (15, p. 
522). 


Standal (19) saw the fundamental unit of 
personality reorganization as (a) the percep- 
tion by the client of the therapist’s attitude 
of unconditional positive regard in relation to 
some experience tentatively symbolized and 
expressed as “self”; (5) the transformation 
of this “external” positive regard to self-re- 
gard; and, (c) the generalization of this 
newly developed self-regard to similar or re- 
lated denied or distorted experiences, which, 
in turn, may then be tentatively symbolized 
and expressed as “self.” As to the number of 
such fundamental transactions, psychothera- 
peutic change is “. . . the fusion of thou- 
sands of instances of the process we have just 
described . . .” (19, p. 100). And a sum- 
mary statement which suggests the association 


of personality reorganization with extent of 
therapeutic contact is the following: “The 
maladjusted individual has an extensive sys- 
tem of conditions of worth which are rela- 
tively impervious to anything but a sustained 
relationship characterized by positive regard 
transactions as extensive as those upon which 
the existing self-regard structure was built” 
(19, p. 111). 

One client-centered theorist has taken a 
position which implies that personality change 
may not take so long as is ordinarily be- 
lieved. Analyzing the processes of psycho- 
therapy and personality reorganization in 
terms of learning and perception theory, 
Butler (1) argued that when the therapist 
consistently communicates understanding and 
acceptance from the very beginning of ther- 
apy, he is less likely to arouse many time- 
consuming and perhaps unnecessary resist- 
ance reactions in the client. The implication is 
that client-centered therapy may require less 
time for a given degree of personality reor- 
ganization than does an approach which in- 
volves a period of relatively passive behavior 
on the part of the therapist followed by a pe- 
riod of systematic interpretation. Neverthe- 
less, it is clear throughout most of the paper 
that Butler still sees personality reorganiza- 
tion as a gradual and time-consuming process 
even under optimal circumstances. 

Changes in the variables represented by 
the other items of the rating scale may often 
be closely related to tizae, but unlike changes 
in personal integration they may easily be 
envisaged as occurring over very short pe- 
riods. For example, although one usually ex- 
pects therapy to become a more emotional- 
experiential process (Item 2), with certain 
clients or certain client-counselor combina- 
tions therapy may be heavily emotional-ex- 
periential from the beginning. Similarly with 
the liking or respect the client and counselor 
have for each other (Items 5 and 6). Al- 
though mutual respect between client and 
counselor is likely to grow with increasing 
therapeutic contact, they will often like and 
respect each other immediately, or greatly 
increase mutual liking and respect on the 
basis of a few interviews. Even movement to- 
ward life adjustment (Item 8), which is often 
a very lengthy process, may proceed rapidly 
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with some stroke of good fortune, or through 
a single fresh perception of some key prob- 
lem. Similar arguments can be advanced for 
each of the other items, including the judg- 
ment of over-all success (Item 10), the most 
frequently studied -tem of the scale. “Suc- 
cess” has no specific referents but presumably 
is based upon a combination of case factors— 
those included in the scale as well as others 
which may be highly idiosyncratic to the 
counselor. A short case can be judged as 
highly successful almost exclusively on fac- 
tors relatively independent of case length, 
e.g., rapport, insights achieved, displays of 
emotion, client satisfaction, and so forth. 

With the above considerations in mind, the 
following two hypotheses are advanced: 

1. Movement toward personal integration 
is positively related to length of therapy. 

2. Movement toward persqnal integration is 
more highly related to length of therapy than 
is change or outcome on any other of the 
nine case variables. 

Although the other nine case variables are 
not likely to be as closely related to length of 
therapy as movement toward personal inte- 
gration, clinical experience suggests that they 
are all somewhat dependent upon the amount 
of contact between client and therapist. Ac- 
cordingly, it is hypothesized that: 

3. All the other variables defined by the 
items of the case rating scale are related to 
length of therapy. 


Procedure 


Subjects. The data were taken from the 
cases of 73 research clients who were seen for 
at least two interviews at the Counseling Cen- 
ter, University of Chicago, during the period 
1949-1954.° To include cases of one inter- 
view, as does Cartwright (4), has the ad- 
vantage of not excluding data which may be 
pertinent, but it can be argued that single in- 
terview cases may often be nothing more than 
“preliminary interviews” in which the client 
has simply sized up the situation and decided 
against entering therapy. When a rather high 
cutoff point is selected, as in the studies by 
Seeman (18) and others (17) where only 
cases of six or more interviews were used, the 


* The sample of subjects is almost, but not quite, 
identical with that studied by Cartwright (4). 
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chances of including pseudo-cases are greatly 
decreased, but the chances of excluding real 
cases are increased considerably. In the pres- 
ent study, cases with but one interview 
were eliminated on the assumption that, even 
though the client had had a preliminary in- 
terview with a different counselor, the first 
interview with the therapist proper consti- 
tutes a preliminary interview from the client’s 
point of view. Where the client continued 
with therapy, however, the first interview 
was assumed to have been therapeutic and 
hence justifiably included in case length. It 
also might be mentioned that the inclusion of 
single interview cases tends to raise slightly 
the correlations to be reported, so the exclu- 
sion of such cases leads to more conservative 
estimates of the relationships. 

The subjects were 42 males and 31 females, 
25 of whom were community clients and 48 
of whom were students. The mean age was 
26.7 with a standard deviation of 4.5. Al- 
though many of the clients were referred to 
the Center, all came of their own volition and 
participated in the various research projects 
on the same basis. They were seen by 16 dif- 
ferent therapists, two of whom were females. 
The therapists ranged in experience from 
about one year to over 15 years of thera- 
peutic work. The largest proportion had from 
three to six years of experience. 

Case length. For this study case length, the 
independent variable, is the amount of time 
spent with the therapist and is measured in 
terms of number of interviews, each interview 
being slightly less than one hour long. The 
decision to end therapy was almost invariably 
left to the client. 

The distribution of case length is presented 
in Table 1, and is highly positively skewed. 
In order to fulfil the assumption of normality 
for the Pearson product-moment correlation 
coefficient and to obtain simpler relation- 
ships, the logarithm to the base 10 of the 
number of interviews was calculated for each 
case. The transformed distribution, as shown 
in Table 1, approximates normality. Also it 
was found that the transformation produced 
a more linear relationship between case length 
and movement toward personal integration. 
The transformed values were used in all sta- 
tistical calculations involving case length. In 





Table 1 


Distributions of Cases by the Number of Interviews 
and by the Logarithm of the Number of 
Interviews of Each Case 














Logio of the 
Number of Number number of Number 
interviews of cases interviews of cases 
2-8 17 .30- .49 1 
9-15 16 50- .69 3 
16-22 6 .70— .89 9 
23-29 7 .90-1.09 16 
30-36 5 1.10-1.29 10 
37-43 4 1.30-1.49 8 
44-50 4 1.50-1.69 12 
51-57 4 1.70-1.89 10 
58-64 3 1.90-2.09 2 
65-71 1 2.10-2.29 2 
72 plus (to 178) 6 
N = 73 N = 73 
Mean = 30.69 Mean = 1.281 
SD = 32.53 SD = 0.428 
Median = 18 Median = 1.255 


Antilog of Mean = 19.1 
Antilog of Median = 18.0 





evaluating our results it should be remem- 
bered that this transformation makes differ- 
ences in case length for shorter cases much 
more important than corresponding differ- 
ences in length for longer cases. 

As has been indicated, the dependent vari- 
ables of this study were inferred from the 
case-rating scale used by Seeman (18). All 
ratings were made by the counselor at the 
termination of therapy. The first eight items 
required a rating of the client for the begin- 
ning and for the end of therapy. The differ- 
ence between these two values thus represents 
a movement score and was used in all calcu- 
lations for the first eight items. The last two 
items, client satisfaction and success, required 
a rating only for the end of therapy. Since 
counselor judgments were the only estimates 
of change or of outcome on all ten case vari- 
ables, their reliability and validity deserve 
considerable attention. 

Reliability. The reliability of counselor 
judgments is difficult to estimate accurately 
since even the simplest prerequisites for such 
estimates are either difficult or impossible to 
meet. If ratings are to be made on the basis 
of the total therapeutic situation, then inter- 
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judge agreement cannot be found since only 
one therapist is intimately familiar with each 
case. Estimates of intrajudge consistency, al- 
though not impossible, involve considerable 
difficulty because of the nature of the scale 
as well as the subject matter. The shortness 
of the scale and the type of questions makes 
a split-half or alternate-forms approach rela- 
tively unfeasible. 

A simple test-retest is, strictly speaking, 
impossible because the counselor cannot be 
exposed to the total therapeutic situation 
twice. The customary procedure is to read- 
minister the scale as an approximation of 
test-retest conditions. This procedure was fol- 
lowed in the present study but its limitations 
are apparent. The rating of a case involves 
thought and a decision as to the number to 
assign to a given item. If the second rating is 
made after a short time, it may be based 
largely on memory of the first rating, result- 
ing in a spuriously high estimate of reliabil- 
ity. On the other hand, if the first and second 
ratings span a time interval long enough to 
allow the first rating to be forgotten, the 
counselor may have also forgotten many as- 
pects of the case. He may also have changed 
considerably his frame of reference about 
therapy. These two factors would result in an 
underestimation of reliability. Of the two pro- 
cedures the latter is usually followed since it 
leads to an estimate of minimal reliability. 

Seeman (18) had seven cases rerated after 
a mean interval of five months and found a 
mean correlation between judgments for all 
items (using Fisher’s normalizing transforma- 
tion) of .81. Cartwright (4) had seven coun- 
selors rerate 15 cases on Item 10, success, 
after a mean interval of 14.2 months and 
found a correlation of .86. 

In the present study five counselors rerated 
all ten items for 11 cases after a median in- 
terval of 34 months. Table 2 presents the 
correlations between these ratings. Reliability 
for Items 7, 8, 9, and 10 ranged from fair to 
excellent. For the other items the reliability 
was not good or doubtful. In evaluating these 
results it should be noted that the sample was 
small and that the time interval between rat- 
ings was very large. For the beginning rat- 
ings the latter factor was even more critical 
than for the terminal ratings since the in- 
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Table 2 


Reliability Coefficients Between First and Second Counselor Ratings for Beginning, Terminal, 
and Movement Scores on Ten Case Variables 


(N = 11, except for B and M coefficients for Item 4 where VN = 10) 








Case Rating Scale Items 














Counselor ——— — — —— — - 
rating 1 2 3 + 5 6 7 8 9 10 
Beginning Ser BB 05 —.63* — — 50 .67* 
Terminal 36 .70** .28 .63* Al .61* .68* .67* oe 86—67* 
Movement 40 55 —.27 54 -- = —. .69** 
*> <.05 
> < .02 
> < .001 


terval between the start of therapy and the 
second rating was even longer. On several 
cases, and notably for beginning ratings on 
Items 5 and 6, some counselors could not re- 
member enough to attempt a second rating. 

Validity. Although demonstrations of the 
validity of counselor judgments based upon 
first order criteria are almost unavailable, re- 
lationships with many lesser order criteria 
have been established over the past few years. 
Citing earlier studies, Seeman (18) pointed 
to a significant relationship between counselor 
ratings of success and a rising ratio of posi- 
tive attitudes as therapy proceeds (14), a 
significant composite correlation (rho = .70) 
between counselor ratings and five experi- 
mentally independent process measures (13), 
a significant correlation between case ratings 
and MMPI changes over therapy (12), a cor- 
respondence between Rorschach change and 
case rating (10), a correlation between case 
ratings and Rorschach changes significant at 
the 10 per cent level (11), and three findings 
of no relationship between Rorschach changes 
and counselor ratings (3, 9, 12). 


In evaluating these results it should be pointed 
out that the process measure, rise in positive atti- 
tudes, studied by Raimy (14), and the process meas- 
ures used in the five studies analyzed by Raskin 
(13) (ie., attitudes toward self, acceptance of and 
respect for self, understanding and insight, maturity 
of behavior reported by the client, and defensive- 
ness) were derived from transcriptions of the thera- 
peutic interviews and also are just the kinds of 
criteria a client-centered therapist might use in 
evaluating the success of a case. These two factors 
detract from the validity implications of the results. 
On the positive side, however, a body of clinical 
and theoretical knowledge supports the notion that 
changes in these process measures are associated with 





a healthier adjustment to life. The results thereby 
lend some indirect support to our confidence in the 
functional validity of counselor judgments. Simi- 
larly, the findings in two studies of positive change 
on MMPI factors and Rorschach perceptions indi- 
cate that counselor judgments differentiate some kind 
of behavior which may in turn be related to a 
healthier personality reorganization and life adjust- 
ment. The implications of the Rorschach finding, 
however, are seriously mitigated by the four other 
Rorschach studies reporting nonsignificant results. 
In more recent studies, Butler and Haigh (2) dis- 
covered a significant increase in self-esteem, as in- 
ferred from the degree of congruence between self 
and self-ideal concepts, for a “definitely improved” 
group of clients as compared with a “not definitely 
improved” group and a control group. Counselor 
judgments of success was one of the two criteria of 
improvement. Dymond (5) found counselor judg- 
ments of success to be significantly related to ad- 
justment scores based on clinicians’ judgments of 
Q-sort statements. Gordon and Cartwright (7) also 
reported a significant correlation (rho = .60) be- 
tween rated success and these Q-adjustment state- 
ments. Vargas (21) found significant relationships 
(rhos ranging from .64 to .99) between judged suc- 
cess and six indices of increasing self-awareness. 


As support for the validity of counselor 
judgments the above findings are subject to 
the same kinds of limitations and advantages 
discussed previously with respect to process 
measures of client-centered therapy. The re- 
lation of the criterion variables to adjustment 
in everyday life is unknown, and the data 
upon which they are based, i.e., client state- 
ments about self, are the same kind as those 
available to the counselors and would be 
likely to influence judgments of success. 


The positive findings of two other studies appear 
to have the advantage of being more independent of 
counselor judgments. Dymond (6) found counselor 
evaluations of success to be significantly correlated 
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with ratings of adjustment based on TAT analyses 
after a follow-up period of six months. Tougas (20) 
found a relationship between rated improvement and 
degree of ethnocentrism, there being a significant 
likelihood that a client would be rated from 5 
through 9 on global success if his ethnocentrism 
score was at or below the mean upon entering 
therapy. A third study, with a criterion variable of 
about the same order as the above two, reported 
findings which are unequivocally nonsupportive. 
Grummon and John (8) found no correlation be- 
tween counselor judgments and mental health, as in- 
dicated by the ratings of TAT stories by a psycho- 
diagnostician on several scales based on a psycho- 
analytical conception of mental health. 

The clearest findings bearing on the functional va- 
lidity of counselor judgments are those of Rogers 
(16) who reported a significant relationship (r= 
41) between counselor ratings of success and inde- 
pendent ratings by two friends of the client on a 
scale designed to measure emotional maturity, an 
adaptation of the Willoughby Emotional Maturity 
Scale. The criterion variable was based on behaviors 
in everyday life which also are relatively unavail- 
able to the counselor. 

The findings cited above bear only on ratings of 
success, ie., Item 10 on the scale. As to the other 
items, Vargas (21) found five measures of develop- 
ing self-awareness to be significantly correlated (rs 
of .67 to .86) with ratings of personal integration 
(Item 7), and a sixth measure not significantly re- 
lated. Vargas (21) also reported negative correla- 
tions between personal integration and measures from 
four of the rating scales devised by Grummon and 
John (8). Rogers (16) noted a significant positive 
correlation (r = .50) between the degree of change in 
personal integration (Item 7) and the degree of 
change in maturity of behavior as seen by friends 
over the period of therapy. Over the period from 
before therapy to the follow-up point the correlation 
was higher (r = .67). Finally, Seeman (18) reported 
significant correlations (rs ranged from .20 through 
89) between the first nine items and the success 
rating. 


To summarize the findings on validity, 
there is some evidence against and consider- 
ably more evidence for the belief that coun- 
selor judgments of success are fairly well re- 
lated to test, therapy, and social behaviors 
thought to be indicative of success in psycho- 
therapy. In two studies counselor judgments 
of personal integration showed fair to high 
relationships with therapy behavior and fair 
to good relationships with maturity of be- 
havior in everyday life. One study showed 
negative correlations between ratings of per- 
sonal integration and ratings of test behav- 
ior. Another study reported low to very high 
relationships between rated success and coun- 
selor ratings on the nine other variables. 








Results and Discussion 


Case length and movement toward personal 
integration. The product-moment correlation 
between case length and movement toward 
personal integration (with number of inter- 
views transformed logarithmically) is .58, 
which is significant beyond the .001 level of 
confidence (Table 3). Figure 1 shows mean 
movement scores plotted against log number 
of interviews. The values for the points of the 
graph may be found in Table 4. 

The first hypothesis of this study, that 
movement toward personal integration is posi- 
tively related to case length, is supported. It 
may be said with considerable confidence that 
change in personal integration has a fairly 
good linear relationship with the logarithm of 
case length. Conversely, case length may be 
regarded as a more meaningful variable than 
it has hitherto appeared. 

Movement toward personal integration vs. 
all other case variables. Referring to Table 3, 
it will be seen that, although most of the 
other variables have significant correlations 
with log case length, movement toward per- 
sonal integration has the highest (r = .58). 
The next highest is over-all success (r = .37). 
The difference between .58 and the values of 


Table 3 


Product-Moment Correlation Coefficients Between Log 
Number of Interviews and Counselor Judgments 
of Movement or Outcome* on 
Ten Case Variables 











Correla- 
tion with 
log 
Item length WN bt 
1. Intellectual-cognitive —.28 73 02 
2. Emotional-experiential 32 73 O1 
3. Personal-situational 16 72 >.10 
4. Focus on relationship as. U3 01 
5. Client’s liking or respect 29 73 02 
6. Therapist’s liking or respect 18 73 »>.10 
7. Personal integration 58 72 001 
8. Life adjustment 32 69 01 
9. Client satisfaction ; a 05 
10. Global success le 01 





*Items 9 and 10 were judged as to outcome alone. All 
others were judged for the beginning and for the end of therapy. 

+t Although several tests logically could be one-tailed, two- 
tailed tests were used throughout for convenience of presenta- 
tion. No nonsignificant correlation achieves significance at 
the .05 point. 
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Fig. 1. Movement on personal integration as a func- 
tion of log number of interviews for 72 cases. 


r for each of the other variables (using Fish- 
er’s transformations) is significant beyond the 
.O1 level (¢ 2 3.30). 

These “results clearly support the second 
hypothesis under consideration, that change 
in personal integration is more highly related 
to case length than change or outcome on 
nine other case variables generally deemed 
important in client-centered therapy. Perhaps 
personal integration should be given a larger 
role in future studies of client-centered ther- 
apy. By and large the over-all success rating 
has been used as the major case variable 
where counselor judgments are concerned. If 
movement toward personal integration is more 
highly related to actual amount of therapy 
than is the estimated degree of success, it may 
be a more fruitful variable to study. 

Case length and movement or outcome on 


Length of Therapy in Relation to Personal Integration 






other case variables. Seven case variables 
have low but significant correlations with log 
case length: success rating of the case (r = 
.37); change in the degree to which the client 
used the relationship itself as a focus for ther- 
apy (ry = .33); change in the life adjustment 
of the client (r = .32); change in the degree 
to which therapy was an emotional-experien- 
tial process for the client (ry = .32); change 
in the client’s attitude of liking or respect for 
the therapist (r = .29); change in the degree 
to which therapy was an intellectual-cogni- 
tive process for the client (ry = — .28); and 
the client’s satisfaction with the outcome of 
therapy (7 = .23). Two case variables did 
not have significant correlations with case 
length: change in the therapist’s attitude of 
liking or respect for the client (r = .18); and 
change in the degree to which the client per- 
ceived therapy as a process of personal ex- 
ploration as opposed to an analysis of life 
situations (ry = .16). The manner in which 
the scores for the various items are distributed 
is shown in Table 4, which presents the mean 
movement or outcome scores for successive in- 
tervals of log case length for each of the 
items. 

The results support the third hypothesis, 
that the nine other case variables are related 
to case length, for seven of the variables, but 
not for two. As predicted, the relationships 
are significant but not strong. It seems clear 
that factors other than amount of therapeutic 
contact are largely responsible for change or 


Table 4 








Mean Movement or Outcome* Scores on Ten Case Variables for Successive Intervals of Log Case Length 











Log Number of Interviews 
(Raw Number of Interviews) 








Case-Rating 30-69 .70-89 .90-1.09 1.10-1.29 1.30-1.49 1.50-1.69 1.70-1.89 1.90-2.09 2.10-2.29 

Scale Item (2-4) (S-7) (8-12) (13-19) (20-31) (32-49) (SO0-78) (79-124) (125-197) 
1 —1.75 —0.71 —1.50 —0.90 —2.50 —2.17 —3.30 —2.50 — 1.00 
2 2.25 1.00 0.63 1.40 2.50 2.25 3.30 4.50 —0.50 
3 2.50 1.33 1.75 1.40 2.50 2.50 2.40 0.50 5.00 
4 0.25 0.44 1.50 1.60 1.50 1.83 2.20 3.00 2.50 
5 0.50 0.71 1.37 1.20 1.13 1.75 2.80 3.00 —0.50 
6 1.75 1.00 1.56 1.20 0.87 1.83 2.10 2.50 0.00 
7 0.25 1.11 1.81 1.30 3.00 3.09 3.60 3.50 3.50 
8 1.75 2.00 1.93 0.89 2.37 2.36 3.20 2.50 4.00 
9 5.25 5.00 6.20 4.30 5.50 6.33 6.50 6.50 6.00 
10 5.00 4.11 5.63 3.20 6.25 6.00 6.60 6.50 5.50 





* Items 9 and 10 were rated on outcome alone. 


Table 5 


Product-Moment Correlation Coefficients Between 
Movement on Item 7 (Personal Integration) 
and Movement on Items 1 Through 8 and 
Outcome on Items 9 and 10 











Correla- 
tion with 
Item Item7 WN _ p-value 
1. Intellectual-cognitive —42 72 001 
2. Emotional-experiential a dm 
3. Personal-situational » Ben; oe 
4. Focus on relationship 26 72 ~~ «4.05 
5. Client’s liking or respect 52 72 001 
6. Therapist’s liking or respect a Vea "ae 
8. Life adjustment 66 69 .OO1 
9. Client satisfaction “3 i. Oe 
10. Global success an we ae 





outcome along these nine various dimensions. 
Movement toward personal integration and 
other case variables. Table 5 presents the cor- 
relations between movement on personal inte- 
gration and the other case variables. No hy- 
potheses have been advanced concerning these 
relationships, but it is of interest to compare 
them with their individual correlations with 
log case length. Although movement on per- 
sonal integration correlated fairly well with 
several other items, these items did not cor- 
relate nearly as wéll with log case length. 
Case length and all other case variables. 
As an over-all test of the relationship be- 
tween case length and all other case variables 
a two-way analysis of variance was calculated 
for the data in Table 4.* Table 6 presents the 
results of this analysis. The effect of length 
is highly significant (F = 6.39, p < .001), 
which lends additional support to the rele- 
vance of case length for the study of therapy. 


Summary and Conclusions 


On the assumption that case length should 
be more clearly related to therapeutic change 
than the results of previous studies indicated, 
it was compared with movement toward per- 
sonal integration, movement toward life ad- 
justment, over-all success, and several other 
variables derived from counselor judgments 
on 73 cases of two or more interviews in 


4 The signs of the means of Item 1 were reversed 
to simplify the analysis. 
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which the client-centered approach was used. 
On clinical as well as theoretical grounds it 
was hypothesized that movement toward per- 
sonal integration would be more highly re- 
lated to case length than would any other 
case variable, but that all case variables would 
be related to case length. To fulfil the assump- 
tion of normality and to obtain more linear 
relationships the number of interviews for 
each case was transformed logarithmically. 

The reliability and validity of counselor 
judgments were discussed. Although the as- 
sumptions for reliability estimates were not 
met, a reasonable case was made for taking 
the agreement between two widely spaced 
administrations of the counselor rating scale 
as an estimate Jf minimal reliability. Esti- 
mates from this study, as well as previous 
ones, indicated a fair to good degree of reli- 
ability of counselor judgments on some items 
and poor or no reliability on others. The va- 
lidity of two of the items, success and per- 
sonal integration, was, in general, supported. 
For the validity of the other items there was 
too little evidence to warrant any conclusions. 

The two major hypotheses were fully sup- 
ported by the results. The correlation be- 
tween movement toward personal integration 
and log case length was .58, which was sig- 
nificant at the .001 level of confidence. The 
values of ¢ for the differences between the 
correlation of movement toward personal inte- 
gration with log case length and the correla- 
tions of the other items with log length were 
equal to or larger than 3.30, and significant 
at less than the .01 level. 

The hypothesized relationships between case 
length and the other case variables were sup- 


Table 6 


Summary of the Analysis of Variance for Successive 
Intervals of Log Case Length Versus Mean 
Scores on Ten Case Variables 











Sum of Variance 
Source of variation squares df _ estimate 
Between items 207.2360 9 23.0262 
Between lengthintervals 37.7307 8 4.7163* 
Within groups 53.1144 72 0.7377 
Total 298.0811 89 





* Length F = 6.39; m = 8, my = 72; p < .001. 
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ported in the majority of instances. Low but 
significant correlations were found between 
log case length and seven variables. Two case 
variables correlated with log case length in 
the predicted direction, but the correlations 
were not significant. The F for the effect of 
log case length on all the variables was highly 
significant. 

The major conclusions were: (a) Change 
in level of personal integration is positively 
related to case length. Such change has a 
moderate linear relationship with log case 
length. (6) Change in level of personal inte- 
gration is more highly related to case length 
than change or outcome on other important 
case variables. (c) Most case variables are 
slightly related to length of therapy. (d) 
With respect to actual amount of therapy, 
change in personal integration may be more 
important than rated success or other case 
variables. (e) Case length can be a meaning- 
ful variable in the study of therapy. 


Received May 17, 1956. 
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The Assessment of Communication: The Relation 


of Clinical Improvement to Measured Changes 
in Communicative Behavior 


Betty L. Kalis and Lillian F. Bennett 
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San Francisco, California 


The assessment of recovery from mental 
illness which requires hospitalization may 
center on a patient’s sociological recovery, on 
his capacity to regain his place in the com- 
munity and to remain outside of the hospital, 
or on his psychological equilibrium, on the 
cohesiveness of his personality integration. 
Social recovery is relatively easy to ascertain 
on an actuarial basis after the patient has 
left the hospital. Psychological recovery is 
more difficult to measure, partly because ade- 
quate independent criteria of such recovery 
have not been established. Most difficult of 
all is the assessment of recovery while the 
patient is still in the hospital. It is possible 
to reach agreement about the extent of im- 
provement, however, by inquiring of those 
who have contact with the patient, and it 
is on the basis of such inquiries that dis- 
charges from psychiatric hospitals are fre- 
quently made. The psychiatrist who sees the 
patient in therapy discusses changes he has 
observed in the interview. The nurses report 
changes in his behavior on the ward. The 
psychologist notes variations in his diagnos- 
tic test protocols. Perhaps the social worker 
brings data from relatives who have seen the 
patient for week-end visits. And the patient 
gives his own impressions of his readiness to 


1 This fifth study under a common title is part of 
a long-term research project on communication car- 
.tied out at the Department of Psychiatry, Univer- 
sity of California School of Medicine, and the Lang- 
ley Porter Clinic, San Francisco, under the direction 
of Dr. Jurgen Ruesch. This investigation was sup- 
ported in part by a research grant (M-534) from 
the National Institute of Mental Health of the Na- 
tional Institutes of Health, Public Health Service. 
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leave. After pooling all this information, a 
decision is made which is based on a com- 
parison of the patient’s behavior with the be- 
havior of other patients who were discharged. 
In reviewing all the criteria used in the as- 
sessment of improvement, it would appear 
that changes in the communicative behavior 
of the patient heavily influence the decision 
of the physician. Ruesch (6) has called at- 
tention to the fact that the vast majority of 
terms used in psychiatry refer to the com- 
municative behavior of patients, and that, 
in fact, all psychopathology can be viewed 
as a disturbance of communication. In the 
present study, therefore, the hypothesis was 
adopted that mental illness requiring hos- 
pitalization is characterized by a breakdown 
of communication, and that improvement is 
accompanied by more effective communica- 
tion of the patient with significant persons in 
his surroundings (5). Human communication 
can be examined only in a social context (7), 
and a method has been developed, the Inter- 
personal Test which we have described in 
detail elsewhere (7), making it possible to 
measure the effects that communication has 
had upon the patient in a two-person situa- 
tion. By repeating the tests in the course of 
several months, changes in the patient’s ways 
of communicating likewise can be detected. 


Method and Subjects 


To test the relationship between rated im- 
provement and effectiveness of communica- 
tion, twenty-five psychiatric inpatients and 
the relatives accompanying them to the hos- 
pital were given the Interpersonal Test at 
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the time of admission * and again at the time 
of discharge. The Interpersonal Test consists 
of Q sorts (9) of cards on which are typed 
simple statements bearing upon actions, mo- 
tivations, intentions, and moods occurring in 
two-person situations. The statements were 
classified into the following categories: 


Statements Referring to Action 


1. Simple action statements of varied levels of ab- 
straction. 

Examples: “I argue with him,” “I reproach him.” 

2. Actions which denote intent. Example: “I try 
to reassure him.” 

3. Actions which infer intent. Example: “He tries 
to reassure me.” 

4. Actions with built-in effect. Example: “He em- 
barrasses me.” 

5. Action with inferred effect. Example: “I bore 
him.” 

Statements Referring to Feelings 

6. Subjective feelings. Example: “I like him.” 

7. Inferred feelings. Example: “He is ill at ease 
with me.” 


Statements Referring to Attitudes or Expectations 


8. Subjective attitudes. Example: “I respect him.” 
9. Inferred attitudes. Example: “He trusts me.” 


Interactive Statements Bearing upon 
Personality “Traits” 


10. Interpersonal trait variability. Example: “I 
have difficulty making decisions when with him.” 


The method is based on reciprocal sorts in 
which each person sorts the cards twice— 
once as applied to self and once as applied to 
the other person. The results are then paired 
in the following fashion: Statements sorted 
by Person A (what “I do when with him’) 
are paired with the corresponding statement 
of Person B (what “he does when with me”), 
in this instance the Patient “I-him” with the 
Relative “He-me.” Thus, the agreement and 
disagreement of two persons about each other 
can be compared in a standardized form. In- 
formation derived from the sortings describes 
functioning for any one occasion, and re- 
peated sortings reflect change over time. All 
patients who were accessible to the testing 
procedure on admission were seen.’ Fifteen 


2 Both patients and relatives were seen within one 
week of admission. 

8 Actually, many more than twenty-five patients 
and relatives were seen initially, but time restric- 
tions on the study made retesting of later admis- 
sions impossible. Results are reported for those pa- 





Clinical Improvement and Communicative Behavior il 


women and ten men participated in the study. 
Their mean age at the time of admission was 
33 years, with a range from 18 to 57 years. 
Mean length of hospitalization was five 
months, with a range from 2 to 114 months. 
Husbands of eleven of the woman patients 
were the participating relatives, and the other 
four, all of whom were unmarried, were tested 
regarding their mothers. In the group of ten 
men, seven wives and three mothers partici- 
pated. 

Among the patients were included cases 
with different diagnoses who received vari- 
ous therapeutic measures during their course 
of hospitalization. Sixteen of the patients 
were diagnosed as having some type of 
schizophrenic reaction, eight as depressive 
reaction, and one merely as “psychotic re- 
action.” Seventeen of the patients were given 
some kind of somatic therapy in addition to 
psychotherapy while the other eight had psy- 
chotherapy only. We were not concerned with 
the differential improvement rates for diag- 
nostic groups or treatment methods. Our 
focus was on the communicative changes 
which accompanied improvement, whatever 
the initial and final states and regardless of 
the kind of treatment or the chronological 
time that intervened. Patient and relative 
pairs could thus serve as their own controls, 
since previous research has demonstrated that 
the Interpersonal Test itself is reliable over 
time (1, 2, 4). 

An assumption of the study was that the 
relative participating in the admission pro- 
cedure was a significant person in the pa- 
tient’s interpersonal sphere. In every case, 
the patient had been living with the relative 
concerned up to the time of hospitalization. 
It seemed meaningful to ask these people 
“How do you act with each other? What is 
the nature of your relationship?” We might 
be interested, further, in the question of what 
it is about the relationship which results in 
the one person becoming the patient while 
the other remains out of the hospital. 

Clinical indices of improvement were based 
upon rating scales designed for this study 
and filled out by a research psychiatrist from 
information in the clinical charts. The ratings 
were made for reported premorbid status, 


tients and relatives seen both on admission and at 
discharge. 
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condition at time of admission and again at 
time of discharge. A fourth assessment of 
status was also made approximately one year 
after discharge. The rating scales covered 
such areas of functioning as occupation, home 
and family, friends and community, physical 
and mental status, ward behavior, inferred 
attitude toward self, etc. Patients were rated 
as unimpaired, mildly impaired, or severely 
impaired for each of the various aspects of 
these areas. The psychiatrist assessing im- 
provement attempted to support all ratings 
with specific behavioral descriptions bearing 
upon the area involved. This procedure re- 
quired methodical study of the material in 
the charts and, while it was time-consuming 
and complicated, the resulting assessment of 
functioning probably has greater validity 
than simple over-all impressions. 


Results 


On the basis of these ratings at the time of 
discharge, the patients could be classified as 
improved, moderately improved, and unim- 
proved. Improved patients were those who 
were unimpaired or who had mild impair- 
ment in no more than two areas at the time 
of discharge. Eight of the patients fell in this 
category. Moderately improved patients were 
those who were judged to be still mildly im- 
paired in three or more areas but not severely 
impaired in any at the time of discharge. 
Nine patients were so classified. The remain- 
ing eight patients, called unimproved, were 
still severely impaired in one or more areas 
at the time of discharge in addition to mild 
impairment in others. The ratings at time of 
discharge only were used since they corre- 
lated — .90 with improvement measured by 
the change in ratings from admission to dis- 
charge. 

Analysis of Interpersonal Test correla- 
tions* supported the hypothesis that im- 
proved patients would agree better with their 
relatives about the nature of their interrela- 
tionship than unimproved patients. Correla- 
tions for each half of the test (Pt. “I-him” 
with Rel. “He-me” and Pt. “He-me” with 
Rel. “I-him”) were computed for each test- 
ing (at admission and at discharge). From 
these four measures of agreement, three 


4We are grateful to Mrs. Sarah Dean for her par- 
ticipation in the statistical analysis of the data. 


measures of change or shift in agreement 
were derived from the data. 

These measures were the “I-him Shift,” 
the “He-me Shift,” and the sum of these two, 
called the “Total Shift.” They were computed 
as follows: 


Correlations were converted to z scores 
(3) and the absolute change in z score from 
Time I to Time II (admission to discharge) 
was calculated. For example, Patient L. had 
the following correlations with his wife: 


Pt. “I-him” with Rel. ““He-me” 


A B 
Time I Time II 
r Zz r Zz 
36 38 65 78 
Pt. ““He-me” with Rel. “I-him” 
C D 
Time I Time II 
r z r Zz 
44 47 62 73 
I-him Shift = B-—A = 40 
He-me Shift = D—C = 26 


Total Shift 


1] 


- (B+D) — (A+C) = 66 


Means of the three Shift scores were calcu- 
lated separately for the three patient groups. 
Table 1 shows the number of patients in each 


Table 1 


Shifts in Agreement with Relatives on Interpersonal 
Test for Patients Rated According to 
Degree of Improvement 

















I-him Shift 
High agreement 0 5 7 
Low agreement 8 4 1 
Unim- Moderately Im- 
proved Improved proved 
He-me Shift 
High agreement 1 5 6 
Low agreement 7 4 2 
Unim- Moderately Im- 
proved Improved proved 
Total Shift 
High agreement 1 5 6 
Low agreement 7 + 2 
Unim- Moderately Im- 
proved Improved proved 
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Table 2 


Mean Differences on Shifts in Agreement for Patients 
Rated According to Degree of Improvement 














Agreement 
Patient Total I-him He-me 
Groups Shift Shift Shift 
Improved +49 +35 +14 
Moderately 
Improved +30 +17 +13 
Unimproved —25 —16 —- 9 


Significance of Differences 





d » ad pp ad > 





Improved vs 
Mod. Imp. 19 NS 18 NS 1 NS 
Improved vs 


Unimproved 74 <Ol 51 <001 23 <10 
Mod. Imp. vs 
Unimproved 55 <02 33 <05 22 <10 





rated group falling above and below the me- 
dian of the total sample. On all shift meas- 
ures both improved groups differed from the 
unimproved group but not from each other, 
as shown in Table 2. 


Improved Group 


Agreement with the relative increased an 
average of 49 points on Total Shift for the 
improved group. This increase arose more 
from the I-him Shift than from the He-me 
Shift. Six of the patients in the group fell 
above the median and two below on both 
Total Shift and He-me Shift, while seven 
were above and one below on I-him Shift. 
The improved group showed greater in- 
creased agreement with their relatives than 
did the unimproved group.’ There were con- 
sistent mean differences between the im- 
proved and moderately improved groups on 
all three measures, but these differences were 
not statistically significant. 


Moderately Improved Group 


Agreement with relatives increased an av- 
erage of 30 points on Total Shift for the mod- 
erately improved group. These patients also 
differed on all three Shift measures from the 
unimproved group, but less markedly than 
the improved group. On all three measures, 


5 Based on ¢ test for difference between means. 
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five of the group were above the median and 
four below. 


Unimproved Group 


Agreement with relatives decreased 25 points 
from admission to discharge for the unim- 
proved group. As reported above, these dif- 
ferences separated the group from both im- 
proved groups on all three measures. On the 
I-him Shift all eight of the unimproved pa- 
tients fell below the median of the total sam- 
ple. Seven of the eight were below the me- 
dian on both the Total Shift and the He-me 
Shift. 

It is apparent that increased agreement 
with relatives on the Interpersonai Test is 
closely related to clinically rated improve- 
ment from psychiatric illness. 

Other characteristics of the sample, namely 
sex, diagnosis, type of treatment, and length 
of hospitalization were examined for their in- 
fluence on the results. Length of hospitaliza- 
tion was the only one of the four showing a 
relationship; the unimproved patients were 
hospitalized longer, on the average, than the 
patients rated as only moderately improved. 
Since the improved group fell between these 
two on length of hospitalization, it was ap- 
parent that the difference did not account for 
the test findings. 


Discussion 


It is possible to examine the individual sets 
of Q sorts for the sources of agreement and 
disagreement between patient and relative. 
Items of disagreement vary from pair to pair, 
however, and it would seem that the critical 
factor involved is the failure or inability of 
unimproved patients to communicate effec- 
tively or, as Ruesch (8) has put it, “to cor- 
rect their information according to feedback.” 
The use of a close relative to measure a pa- 
tient’s communicative effectiveness is a severe 
test, since improved communication might 
occur with others in the environment but 
not with the spouse or parent. These relatives 
are frequently the focus of long, deeply 
entrenched communicative distortions which 
ultimately culminate in severe disturbance 
and are not readily corrected. Block (1), 
Block and Bennett (2), and Kalis (4) have 
demonstrated that people assume different 
roles at the interpersonal level, so that the 
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relationship with the relative tested reflects 
the effectiveness of only one such role. 
Ideally, corresponding use of the test with 
therapists and others in close association with 
the patient weuld provide a more stable index 
of that patient’s increased communicative ef- 
fectiveness. The simplicity of the design that 
was used, involving only the relative, serves 
to emphasize the sensitivity of the test for 
measuring improvement, and suggests other 
possible uses of the test in a clinical setting. 

Discrepancies between clinical ratings of 
improvement and agreement with others on 
the Interpersonal Test would be of particu- 
lar interest. If the test reflects improvement 
while staff opinions disagree, or vice versa, 
exploration of possible sources of the dis- 
agreement could lead to a better understand- 
ing of the patient, his illness, and the nature 
of his interactions with others. Use of the test 
with staff members might also lead to better 
understanding of staff interactions with the 
patient. 

Attempts to relate the test findings to 
status of patients one year after hospitaliza- 
tion were not illuminative. Neither communi- 
cative ability at discharge nor clinical ratings 
were predictive of a patient’s future adjust- 
ment. A retrospective analysis of test differ- 
ences between patients who remained symp- 
tom-free and those who were rehospitalized 
might suggest some predictive indices. Tests 
at the time a patient leaves the hospital 
cannot anticipate the difficulties he will en- 
counter in the future, however, and it seems 
sufficient at present to evaluate a person’s 
effectiveness at a given time. Neither does 
the Interpersonal Test predict which persons 
entering a psychiatric hospital will improve 
and which will not. It merely measures some 
of the communicative correlates of improve- 
ment, when such improvement has occurred. 


Summary 


This study was designed to measure changes 
in communicative behavior of psychiatric pa- 
tients during hospitalization and to relate 
such changes to independently derived in- 
dices of clinical improvement. 

Twenty-five hospitalized psychiatric pa- 
tients and the relative accompanying each to 
the hospital were examined at the time of 
the patient’s admission and again at the time 
of discharge. Each pair was given the “In- 





terpersonal Test” which consists of recipro- 
cal Q sorts bearing upon the nature of the 
relationship as perceived by the two partici- 
pants and reflecting their respective agree- 
ment and disagreement. 

Clinical improvement was assessed with 
the use of ratings pertaining to various areas 
of the patient’s functioning. These ratings, 
made by a psychiatrist independently of the 
Q sorts, were applied to four different peri- 
ods of the patient’s illness: premorbid status, 
time of admission, time of discharge, and time 
of follow-up approximately one year after 
discharge. 

The results indicate that significant dif- 
ferences exist between patients rated “im- 
proved” and those rated “unimproved.” The 
“improved” groups showed better agreement 
between patient and relative following hos- 
pitalization. 

This investigation demonstrates that meas- 
urement of mutual agreement represents a 
valid technique for assessing clinical improve- 
ment; it further shows what is called im- 
provement is in part based upon observation 
of changes in communicative behavior. 


Received April 10, 1956. 
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Maze Test Reactions After Chlorpromazine’ 


S. D. Porteus 
Territorial Hospital, Kaneohe, Hawaii 


In a current investigation of the effects of 
chlorpromazine * 50 male patient with psy- 
choses of various types were given 300 mg. 
of the drug daily for four months and their 
ward behavior rated at three-week intervals 
by specially devised rating scales of eleven 
traits or trait complexes. All were inmates of 
closed wards, with hospital residence ranging 
from 3 to 57 years. Other therapies such as 
insulin coma and electric shock had been used 
without lasting benefit. Patients can there- 
fore be considered typical chronic psychotics 
such as can be found in the “back wards” of 
any state mental hospital. 

With the approval of the medical director, 
Dr. Robert Kimmich, the “double-blind” ap- 
proach was adopted, the same number of pa- 
tients on another closed ward serving as con- 
trols and receiving placebos, all medication 
being closely supervised by Dr. John Regan 
of the psychiatric staff. The whole design was, 
I believe, a good example of hospital team- 
work, the psychologist assuming direct re- 
sponsibility for the research, the psychiatrist 
for treatment. 

The therapeutic results of chlorpromazine 
will be reported fully in another article. A late 
development in the study, namely, changes in 
Maze Test scores, seemed to be of such im- 
portance from both the psychological and 
psychiatric points of view that it was deter- 
mined to give it priority of publication. With 
regard to the behavior ratings, it will be suffi- 
cient to report here that 63% of the chlor- 
promazine patients showed significant or 
marked improvement. On the other hand, 


1Study supported in part by a grant from the 
James McKeen Cattell Fund, New York. 

2 Smith, Kline and French generously supplied free 
of cost all the chlorpromazine necessary for this re- 
search. 
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only 11% of the placebo group showed im- 
provement. Thus, it may be stated that, after 
allowing for suggestibility of both raters and 
patients, over 50% of chronic male psychotics 
benefit to a marked degree by continued 
medication with chlorpromazine. 


Maze Test Applications 


The other development arose through what 
was originally regarded as a minor phase of 
the research design, namely, the application 
of the Porteus Maze Tests before and after 
the use of the drug. Unfortunately, only 13 
patients of 50 were found who were amenable 
to testing or whose Maze results had any 
meaning. These cases were tested by T. 
Greenland, a graduate psychology student, 
under the writer’s immediate supervision. 
After four month’s medication, these patients 
were retested by the writer, using the prac- 
tice-free extension series of Mazes (10). 
Later, two male and seven female cases were 
added to the group, two of the women having 
had larger dosages for a shorter time, while 
five had had the same dosage but for a pe- 
riod of six weeks instead of four months. 

On the basis of a study by Peters and Jones 
(9) who had found that social or ward be- 
havior improvement after psychodrama-ther- 
apy was reflected in significant gains on the 
Porteus Maze, there was every expectation 
that improvement would be shown by simi- 
larly improved chlorpromazine patients. 

The only study in which Maze results are 
reported seems to be one by Gardner, Haw- 
kins, Judah, and Murphree (5). They stated 
that after chlorpromazine four out of nine 
patients improved in Maze scores as against 
a similar result in eight out of ten reserpine 
patients, while placebo patients declined. 
However, apparently only the standard Maze 
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series was used both before and after medi- 
cation, and thus an indeterminate amount of 
improvement must be attributed to practice. 


Decline in Maze Scores 


Considering that Peters and Jones reported 
a gain of 2.5 years in Maze Test age as 
against an average gain of only 0.8 year for 
their control group, the writer was very much 
surprised to find a reverse trend for chlor- 
promazine patients who were also socially im- 
proved. Instead of a gain, there was a net 
loss (algebraic sum of gains and losses di- 
vided by N) of 2.06 years. The percentage 
of patients losing in score was 68.2%, 3 or 
13.2% remained the same, and 4 or 18.1% 
gained. The percentage who gained in the 
study by Gardner et al., was 44%, but it is 
impossible to state how this figure was af- 
fected by practice. 


Comparison With Lobotomy 


The fact that a similar impairment in Maze 
Test performance has been found in every 
important study of the psychometric results 
of psychosurgery makes desirable a review 
of these investigations. The suggestion that 
chlorpromazine acts as a chemical or phar- 
macological lobotomy is, of course, not new. 
In an intensive review of the subject by 
Dundee: 

The mental effects of prolonged administration of 
chlorpromazine have been called “pharmacological 
frontal lobotomy” (Terzain, 1952). Patients show a 
lack of spontaneous interest in their surroundings, 
are generally immobile, and at first glance seem to 
be heavily drugged. However, the higher psychic 
functions are preserved to a remarkable degree and 
patients are capable of sustained attention and con- 
centration (3, p. 362). 


Other references of the same kind are nu- 
merous and seem to be based mainly on some 
similarities of effect. Both lobotomy and 
chlorpromazine are used for the relief of in- 
tractable pain; both diminish anxiety or self- 
concern; both seem to result in improved ap- 
petite and sudden increase in body weight. 
If in addition it can be shown that the Maze 
reactions are similar in psychosurgery and 
chlorpromazine therapy, thus providing a 
comparison of measurable mental effects, the 
analogy would be more complete and mean- 
ingful. 





The Maze Test and Psychosurgery 


Probably no apology is needed for recount- 
ing Maze Test results after various forms of 
frontal lobe operations. Neurosurgery is usu- 
ally outside the sphere of psychologists’ inter- 
ests. Many of them appear oblivious to the 
relevancy to psychometry of psychosurgical 
findings, even though psychologists, such as 
Landis and his co-workers, have played a vital 
role in evaluation of the mental effects of 
these operations. This lack of interest is 
astonishing considering their bearing on the 
meaning and validity of objective tests and 
how the new approaches to psychiatry open 
up whole fields of usefulness for the psycho- 
logical members of the hospital team. 

My own interest in the matter dates from 
1942 when a happy conjunction of a neuro- 
surgeon, a psychiatrist, and a clinical psy- 
chologist resulted in a study that suggested 
for the first time marked impairment in Maze- 
tested abilities after lobotomy. Porteus and 
Kepner (12) reported these mental changes 
in 1944. The study was continued by Porteus 
and Peters, with publication of another mono- 
graph (13) dealing specifically with the rather 
dramatic validation of the Maze as “a frontal 
lobe brain test.” This study confirmed the 
previous results. Porteus and Kepner had re- 
ported a net loss after lobotomy of 1.94 
years, the decline in score occurring in 76.5% 
of the cases. In the second larger study (55 
cases) the net loss was 1.65 years and af- 
fected 81% of the group at the first or some 
later postoperative testing. A control group 
of 55 criminals (unoperated) made a gain of 
1.6 years on their second application of the 
Maze. Porteus and Peters pointed out that 
social recovery seemed to be closely asso- 
ciated with a pattern based on repeated ap- 
plications of the Maze, namely, a marked 
initial postoperative decline in score, followed 
by steady successive increments of score up 
to and beyond the preoperative level. 


Columbia-Greystone Findings 


Consideration of these findings resulted in 
the inclusion of the Porteus Maze in the bat- 
tery of 35 tests applied in the Columbia- 
Greystone investigation on the effects of 
topectomy (gyrectomy). The study was re- 
ported most adequately in 1949 (1). The 








es te Lae 


wa 


i 
: 
4 






















eek a. oe 





af- 


oup 
1 of 

the 
that 

ap- 
rked 
wed 


> up 


d in 

bat- 
nbia- 
s of 


s re- 
The 











Gh ASTM 


Maze Test Reactions 


operated group, as a whole, lost 1.21 years 
on Maze Test age. The pattern of response 
described by Porteus and Peters as asso- 
ciated with social improvement was noted by 
H. E. King (6) who reported the psycho- 
metric changes as “a marked relation be- 
tween performance in this test and social im- 
provement.” Five out of six patients selected 
by psychiatrists as showing the greatest im- 
provement exhibited the characteristic pat- 
tern of an immediate postoperative loss of a 
year or more in score, followed by a regain 
in performance up to or above the preopera- 
tive level. The successive average score of pa- 
tients discharged from the hospital within a 
year were 13.5, 11.4, 13.2, 14.6, and 15 years. 
Thus it was clear that an initial loss in Maze- 
tested functions was characteristic of pa- 
tients who suffered excisions of cerebral cortex 
in various frontal areas, particularly areas 8, 
9, 10, 46, with practically no deficits in re- 
gard to area 11 in the orbital region. 

The second Columbia-Greystone project in- 
volved a variety of surgical insults to the 
frontal lobes, including two types of venous 
ligation, anterior and posterior, thalamotomy, 
thermocoagulation of portions of areas 9 and 
10, and transorbital lobotomies. The average 
loss in Maze Test age for all cases was 1.5 
years postoperatively, but ranged from 0.8 
year in the transorbital lobotomy group to 
4.7 years in the thalamotomies. The loss in 
the more posterior venous ligation patients 
was much more severe than for the more an- 
terior operation. The results were reported in 
book form in 1952 (2). 

The third study of maze scores was de- 
scribed by Sheer (15) who, like King, worked 
under Landis. This investigation concerned 
orbital and superior areas of the frontal cor- 
tex. Again the posterior-superior situs resulted 
in the most serious Maze deficits, with less 
apparent practice improvement for successive 
applications than could be observed in the 
control group. The orbital group lost only 
1.13 years in test age, the superior group 2.88 
years. The Wechsler-Bellevue showed a loss 
of 2 IQ points, but if the Maze results are 
expressed similarly, the Maze Test loss was 
16 points. 

All three Columbia-Greystone project find- 
ings were summarized by Landis at the third 
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Psychosurgical Conference in New York in 
1951: 


In the battery of tests which was used in the first 
Greystone, the second Greystone, and the New York 
State Project, we included both the standard Wech- 
sler-Bellevue and the Porteus Maze Test. In the test- 
by-test analysis of the results which were obtained, 
the only intelligence test which showed a uniform or 
almost uniform loss during the first month after op- 
eration compared to the preoperative performance 
on this battery of tests was the Porteus Maze Test. 
Dr. Porteus had previously reported this sort of loss 
after lobotomy. We confirmed his finding that a 
brain operation performed on the frontal lobes gives 
rise to an immediate postoperative loss in mental 
age of 1 to 2 years in some 80 percent of psycho- 
surgery patients (7, p. 109). 


In all probability, the average loss would 
have been much greater if a practice-free 
form of the test had been available for post- 
operative examinations. The question of per- 
manency of the defects in Maze-tested abili- 
ties could not be settled by the Greystone- 
Columbia studies, since no definite allowance 
could be made for the effect of practice. That 
it was an important factor was shown by 
Sheer (15) who gave the Maze twice before 
operation. The control groups increased 1.60 
score points before operation, and the pa- 
tients to be operated, 1.79 points. Thus the 
postoperative testing became the third appli- 
cation of the Maze, and what the practice 
effects would amount to under those condi- 
tions, no one knows. Again it is necessary to 
point out that these differences are expressed 
in years of test age whereas results for the 
Wechsler-Bellevue are all reported in IQ 
points. 

Robinson Study 


The only evidence so far presented that 
bears on the problem of permanency of Maze 
Test deficits has been supplied by Robinson 
(14) who tested 68 of the Freeman-Watts 
patients, the elapsed time since lobotomy be- 
ing over three years. This was the first ap- 
plication of the Maze, so that no practice ef- 
fects were involved. She used as controls 12 
patients who had been discharged from the 
hospital without benefit of lobotomy. In a 
private communication,® Dr. Robinson ex- 
pressed the opinion that 50 percent of the 
lobotomy cases could get along in the com- 


8 Letter dated Sept. 10, 1951. 


18 


munity “without control or supervision.” On 
a rough scale of social participation the cor- 
relation with Maze scores was .42. Consider- 
ing all the variables involved, this was as high 
as I should expect. 

The Maze score of the controls as reported 
by Robinson in the second edition of the 
Freeman-Watts book (4) was 13.9 years, 
10.6 years for the lobotomy patients, a de- 
ficiency of 3.3 years. Expressed as IQs for 
adults the figures would be 98 and 75 re- 
spectively. Ten patients who suffered “radi- 
cal” (more posterior) lobotomies tested 9.6 
years, IQ 68. It may be of interest to note 
that the average Maze Test age of these pa- 
tients was below that of Australian aborigines 
tested by the writer. All the evidence avail- 
able shows that the more anterior the frontal 
lobe injury the less the Maze deficits. Trans- 
orbital lobotomy causes a much less signifi- 
cant loss. 


Comparison with Chlorpromazine Findings 


From Sheer’s table of scores of 36 cases 
used in the New York State Project, I have 
calculated their average postoperative test 
age to be 11.12 years; the Porteus-Kepner 
(N=17) age was 9.41 years, while the 
chlorpromazine group scored 9.5 years. It 
should be noted that, in the first named 
study, two adult tests were used, raising the 
maximum score two years. For purposes of 
ready comparison I have calculated the losses 
following frontal lobe operations and have 
listed the chlorpromazine cases thereafter. It 
is probable that if the practice-free extension 


Table 1 


Loss on Maze Test in Several Studies of Frontal Lobe 
Operations, and in Chlorpromazine 








Study Result 





1.2 years postoperative loss 
1.5 years postoperative loss 


Columbia-Greystone I 
Columbia-Greystone IT 
New York Brain Study 


Orbital cases 1.13 years postoperative loss 


Superior topectomy 2.88 years postoperative loss 
Robinson study (V18) 3.3 years belew controls 
Porteus-Kepner (N17) 1.91 years postoperative loss 
Porteus-Peters (N16) 1.81 years postoperative loss 
Chlorpromazine group 2.06 years postmedication loss 

(N 22) 
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series had been used for the retesting of psy- 
chosurgical cases their deficits would have 
been larger postoperatively. The general situa- 
tion as regards Maze Test impairment in the 
various investigations appears in Table 1. 


Personality Reactions in the Maze 


In discussing the reactions of psychosurgery 
patients, Landis and Erlick (8) described 
them in these terms: “It is as though there 
were a decrease in the vigilance or wakeful- 
ness of the patient, the changes being as 
varied as those which might occur in any in- 
dividual that was drowsy or sleepy while 
taking a test.” It should be said that while 
drowsiness is typical of chlorpromazine pa- 
tients in the early stages of medication, it 
did not appear to be present at the time of 
testing. Nevertheless, there were very evident 
losses in foresight. Some specific reactions to- 
wards mistakes in the Maze threading are 
worth noting. Though concern was frequently 
voiced, the examiner felt that it was rather 
superficial. One or two patients were very 
apologetic for repeated unsuccessful trials, 
stating that they were sorry to waste so much 
of the examiner’s paper. Others would com- 
ment on their own mental dullness but showed 
little or no real concern. As in the case of 
lobotomy patients there was a tendency to 
give up, or relax vigilance when the test be- 
came difficult, even when the utmost care 
and prudence had been used in the lower 
level tests. It was as if the patient said, “I 
have done all I can. The rest does not mat- 
ter.” This attitude was reminiscent of that 
of Jacobsen’s lobotomized chimpanzees, who, 
when the task was beyond them, shrugged 
their simian shoulders and went on with 
something else. 

However, there is no typical or general re- 
action to the Maze of either psychosurgical 
or chlorpromazine patients but a great va- 
riety of attitudes. Some are nonchalant in the 
face of failure, some are cautious, conscienti- 
ous workers. Some appear to think a great 
deal depends upon their success or failure, but 
most seem to enjoy the experience; some are 
apologetic, others seem emotionally flat. In 
comparison with normals the reactions of pa- 
tients seem more positively notable. In a word, 
they are “different.” 
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Table 2 
Maze Scores Before and After Lobotomy and Chlorpromazine 








Lob. Porteus—Kepner 





Lob. Porteus—Peters 


Chlorpromazine 











Case Pre- Post- Pre- Post- Pre Post- 

No. Op. Op. Diff. Op. Op. Diff. Med Med. Diff. 
1 13.5 14.5 +1.0 8.5 10.5 +2.0 6.0 10.5 +4.5 
2 14.5 15.5 +0.5 10.5 14.5 +1.0 6.0 6.5 +0.5 
3 8.5 8.5 0. 14.5 15.0 +0.5 14.0 14.5 +0.5 
4 6.0 6.0 0. 11.0 11.0 0. 16.0 16.5 +0.5 
5 14.5 13.5 —1.0 10.5 10.5 0. 16.0 16.0 0 
6 6.0 5.0 —1.0 10.0 9.5 —0.5 6.0 6.0 0 
7 12.5 11.5 —1.0 7.0 6.5 —0.5 16.5 16.5 0. 
8 5.0 4.0 —1.0 10.0 9.0 —1.0 6.5 6.0 —0.5 
9 11.5 9.5 —2.0 8.5 7.5 —1.0 13.5 13.0 —0.5 

10 13.0 11.0 —2.0 12.0 9.0 —3.0 8.0 7.5 —0.5 

11 9.0 7.0 —2.0 9.5 6.0 —3.5 12.0 9.5 —2.5 

12 14.0 11.0 —3.0 11.0 7.0 —4.0 15.5 12.5 —3.0 

13 14.0 10.5 —3.5 14.0 9.5 —4.5 15.0 11.5 —3.5 

14 11.0 7.0 —4.0 14.0 9.5 —4.5 15.0 10.0 —5.0 
15 13.0 8.5 —4.5 13.0 8.5 —4.5 14.0 7.5 —6.5 
16 14.0 9.5 —4.5 13.5 8.0 —5.5 9.0* 7.5 —1.5 
17 13.5 8.5 —5.0 9.0* 7.0 —2.0 
18 11.0* 8.5 —2.5 
19 13.5* 8.5 —5.0 

20 15.5* 10.0 —5.5 

21 15.0* 9.0 —6.0 

22 15.5* 8.5 —7.0 

Loss 1.91 Years Loss 1.81 Years Loss 2.08 Years 
76.5% Cases 68.75% Cases 


68.2% Cases 





* Female patients. 


Commonly Observed Tendencies 


Only two specific tendencies seem to the 
writer to be more general with psychosurgi- 
cal and chlorpromazine patients on the one 
hand, than with normals on the other. The 
first seems to be a tendency to come to a sud- 
den full stop when the more complex levels of 
the test are reached. This was called “shatter 
effect” by Brundage * but our groups, espe- 
cially chlorpromazine cases, are at present 
too small to allow definite comparisons with 
normals. 

The other trend seems to be more fre- 
quently observable and more susceptible to 
analysis. This is the tendency to repeat the 
same errors, that is, enter the same blind 
alley more than once. Naturally, this cannot 
occur except when there is failure in two 
trials in tests up to and including year XI. 
In tests XII, XIV, where four trials are al- 


*In a personal communication, Dec. 26, 1943. 


lowed and in Adult I (three trials), the tend- 
ency to repeat errors can be more readily 
detected. 

In the Porteus-Peters study, 20.8% of the 
patients had two or more repeated errors in 
their prelobotomy tests, 37.1% in their first 
postlobotomy Maze examination. The chlor- 
promazine cases also showed 20% with two 
or more repeated errors before medication, 
45% after medication. The average number 
of all repeated errors was 1.1 before medi- 
cation, 1.55 after. Thus a tendency to repeat 
errors seems to be characteristic of psychotic 
patients but is accentuated after both lob- 
otomy and chlorpromazine. 


Individual Scores 


To give a better comparative view, I have 
shown in Table 2 the individual Maze scores 
of these groups, 17 cases of the Porteus- 
Kepner study, 16 “much improved” cases of 
the Porteus-Peters monograph, and the 22 
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chlorpromazine patients. In the last named 
group, the female patients’ scores are given 
after the males. The number (seven) is small 
but the results indicate that all lost ground 
on the Maze and the deficits were greater 
than for males. This trend calls for further 
investigation. Again it should be stressed that 
had the practice-free form been used in the 
first two studies, the Maze impairment would 
undoubtedly have been greater. 

Original data are given in the table for the 
purpose of showing that the decline in Maze 
scores occurs at all levels and has a wide 
range of variation. In other words, the effect 
of chlorpromazine, like that of psychosurgery, 
differs very greatly in different individuals, 
illustrating the tremendous complexity of the 
neurological bases of human behavior. 


Neurological and Practical Bearings 


Questions as to where in the central nerv- 
ous system chlorpromazine exerts its influence 
are only relevant to the purpose of this paper 
insofar as the data cast some light also on the 
localization of Maze-tested functions. In Dun- 
dee’s review (3) he states: “Terzain (1952) 
concludes that the effects of chlorpromazine 
represent depression of the reticular forma- 
tion, particularly the sensory and autonomic 
spheres. . . . From an analysis of the effects 
of chlorpromazine on the central and auto- 
nomic systems, Aron (1954) has come to the 
conclusion that its site of action is probably 
in the hypothalamus. .. .” Since lobotomy 
interferes with the thalamocortical connec- 
tions, this extends the parallel with chlor- 
promazine. Topectomy of areas 9 and 10 also 
causes loss of anxiety and vigilance as meas- 
ured by the Maze, and quite conceivably, the 
mental alertness necessary in Maze threading 
is mediated by the reticular formation. Other 
brain areas are probably involved in Maze 
performance, this matter having recently been 
discussed by the writer (11). The chlorpro- 
mazine and lobotomy patients are, as re- 
gards Maze performance, literally “asleep at 
the switch.” In other terms, both treatments 
interfere with capacity for prehearsal. 

The bearing of these findings on psycho- 
pathology is clear and significant, but they 
have no bearing on the legitimate use of 
chlorpromazine, especially in mental hospital 





practice. If the patient is severely or com- 
pletely disabled by psychosis and can be sig- 
nificantly improved by chlorpromazine, the 
occurrence of some deficits in planning and 
mental alertness will surely not deter the 
psychiatrist from administering the drug. The 
drawbacks would be far outweighed by the 
advantages. As regards the benefit to hos- 
pitalized psychotics, the drug seems to usher 
in a new era in treatment. But if permanent 
deficits follow, then the use of the drug in 
minor mental ailments or with emotionally 
disturbed children would certainly be contra- 
indicated. It was, no doubt, such casual ap- 
plications which the Research Committee of 
the American Psychiatric Association had in 
mind when it issued its recent warning that 
the indiscriminate use of the tranquillizing 
drugs constituted a public danger. 


Further Research 


In relation to our own study, two steps in 
further research are obvious, in addition to 
confirmation of present findings with larger 
groups. The first is to discover the relation of 
social improvement to Maze Test patients of 
response as an aid to patient selection. The 
second and more important is to find out 
whether the deficits are permanent or transi- 
tory. It is quite possible that on this point the 
similarity of pharmacological lobotomy to 
psychosurgery may break down. It may well 
prove to be the case that unlike psychosur- 
gery, the effects are reversible. But only care- 
ful and prolonged research will provide an- 
swers to this most important question. We 
have already suggested that the differences in 
male and female Maze reactions after chlor- 
promazine should be further investigated. Ex- 
tension of our study to the effects of others 
of the so-called “ataractic” drugs is obviously 
desirable. Samples of the behavior scales and 
tests will be supplied on request. 


Summary 


In connection with a study of behavior 
changes following long continued use of 
chlorpromazine with chronic psychotic pa- 
tients, the Porteus Maze Test was applied to 
13 males who were accessible to testing and 
whose scores showed any significance. After 
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four months’ medication the test was reap- 
plied. 

In spite of measured social improvement, 
there was a marked drop in scores. The num- 
ber of cases was then augmented by nine, 
seven of whom were female patients who had 
taken an equal dosage of chlorpromazine for 
six weeks. Analysis of postmedication results 
for the whole group of 22 cases revealed an 
average Maze Test deficit of 2.06 years, af- 
fecting over 68% of patients. 

This evidence that chlorpromazine acts as 
“a pharmacological lobotomy” is summarized 
by means of a review of all important psycho- 
surgical studies in which the Porteus Maze 
was used, and the deficits found are com- 
pared with those now demonstrated to follow 
prolonged chlorpromazine medication. 

The point is emphasized that any parallel 
with psychosurgery has little bearing on the 
use of the drug with hospital mental patients. 
It does, however, serve as a strong contra- 
indication towards its indiscriminate use for 
lesser mental disorders, and with children. 
Other implications are briefly discussed and 
future stepS in research on the subject are 
indicated. 


Received October 15, 1956. 
Early Publication. 
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A Cross Validation of Starer’s Test of 
Cultural Symbolism’ 


William D. Winter 
San Jose State College 


and James W. Prescott 
Marquette University 


In a recent study (1), Starer described a 
technique for investigating whether or not 
subjects (Ss) tend to classify elongated, 
pointed, or angular designs as male symbols 
and round, curved, or containing designs as 
female symbols. In his procedure the S is 
asked to match ten designs, half of which are 
male and half female according to the above 
definitions, with ten first names, five male and 
five female. A correct matching consists of 
the S’s placing together a name and a de- 
sign of the same sex. In his original in-7estiga- 
tion, Starer found that psychotic and normal 
Ss correctly matched the designs and names 
at a level significantly greater than expected 
by chance, suggesting the existence of a gen- 
erally accepted sexual symbolism in this cul- 
ture. It was Starer’s impression that the in- 
ability to match symbols correctly is related 
to psychotic confusion. 

In an effort to repeat Starer’s study and to 
obtain evidence on this latter observation, 52 
male and 55 female hospitalized mental pa- 


1An extended report of this study may be ob- 
tained without charge from William D. Winter, 
Psychology Department, San Jose State College, 
California, or for a fee from the American Docu- 
mentation Institute. Order Document No. 5087 from 
ADI Auxiliary Publications Project, Photoduplica- 
tion Service, Library of Congress, Washington 25, 
D. C., remitting in advance $1.25 for microfilm or 
$1.25 for photocopies. Make check payable to Chief, 
Photoduplication Service, Library of Congress. 





tients were individually tested using Starer’s 
technique, and were also given the group form 
of the MMPI. The distributions of number of 
correct matchings for both male and female 
patients significantly differed from chance ex- 
pectations (? < .001), and the obtained fre- 
quencies were similar to those reported by 
Starer. 

However, the variable of number of correct 
matches was not significantly related to any 
of the commonly used MMPI scales, includ- 
ing those usually associated with psychotic 
thinking, such as F and Sc. This finding, to- 
gether with the fact that there is no signifi- 
cant difference between our female psychotic 
patients and Starer’s normal female Ss in the 
number of correct matchings, fails to sub- 
stantiate Starer’s clinical impression that in- 
correct matches are related to psychosis. This 
lack of relationship with the MMPI leaves 
us in doubt as to whether the ability to 
match sexual symbols correctly is related to 
specific personality factors or is simply a re- 
flection of the membership of an individual 
in this society. 

Brief Report. 
Received September 24, 1956. 
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Correlation of a Modified Form of Raven's Progressive 
Matrices (1938) with the Wechsler 
Adult Intelligence Scale’ 


Julia C. Hall 


Veterans Administration Hospital, Bronx, New York 


Among brain-damaged patients motor im- 
pairment is often so severe that the perform- 
ance tests of the Wechsler Adult Intelligence 
Scale (WAIS) are inappropriate as measures 
of intellectual function. Since research find- 
ings (6) indicate that the performance tests 
are more apt to reflect the effects of brain 
damage, it cannot be assumed that the WAIS 
Verbal Scale gives a comprehensive survey of 
the intellectual function of the brain-damaged 
individual and substitutes for the performance 
tests would, therefore, be very useful, Raven’s 
Progressive Matrices (7, 9) is one potentially 
useful substitute because performance on the 
test is not affected by motor impairment and 
some of its characteristics suggest that it is 
apt to be a more sensitive indicator of im- 
paired function than are verbal tests. The 
relevant characteristics are as follows: (a) 
Matrices is a relatively univocal test of rea- 
soning (12), and ability to reason is thought 
to be highly susceptible to brain damage; (5) 
Matrices and the WAIS performance tests 
have similar age decline curves (8); (c) re- 
ported correlations between Matrices and the 
Block Design test of the Wechsler-Bellevue 
are considerably higher than are correlations 
with the tests of the Verbal Scale (1, 4). The 
investigation reported here studied the cor- 
relation between the WAIS and a modified 
form of the Progressive Matrices (1938). 


1 The author wishes to express her appreciation to 
the staff and trainees of the Clinical Psychology Sec- 
tion, Bronx VA Hospital, for their generous coopera- 
tion in giving the tests on which this study is based 
and to Drs. H. L. Flowers, Chief, Neuropsychiatric 
Service, and R. S. Morrow, Chief, Clinical Psychol- 
ogy Section, for their sustained support. 


Procedure 


In its standard form Matrices is an un- 
timed, 60-item test which usually takes 40-50 
minutes. Since in the clinical situation 50 
minutes devoted to so homogeneous a test as 
Matrices is excessive, a 30-item form, using 
an odd-even method of selection, was made 
up and a 20-minute time limit imposed. Some 
deviations from odd-even selection were made 
in order to secure items less demanding of 
visual acuity. The specific items used are 
shown in Table 3. Since a highly speeded test 
was not desired, the time limit chosen was 
one which pilot-study findings indicated was 
sufficient for most individuals to complete 
the test. 

The subjects of the study are all those pa- 
tients tested on the Neuropsychiatric Service 
during the period from April to December, 
1955, who passed the research exclusion cri- 
teria. The exclusion criteria were: (a) not 
psychotic, (5) no evidence of brain damage, 
(c) if Negro, not educated in a southern 
state, and (d) not educated in a foreign coun- 
try. The brain-damage criterion was inter- 
preted stringently; that is, even in the ab- 
sence of neurological symptoms, either a 
history of head trauma, or of shock therapy, 
or an abnormal electroencephalogram was 
sufficient for exclusion. Eighty-two individu- 
als (all males) fulfilled the research criteria. 


Results 


The data in Table 1 indicate that the sam- 
ple is fairly heterogeneous with regard to age, 
education, and IQ. The shape of the Full 
Scale IQ distribution closely approximates 


































Table 1 


Range, Mean, and Standard Deviation of Age, 
Education, and WAIS IQ 











Variable Range Mean SD 
Age (to nearest year) 19-49 31.71 7.29 
Education 7-17 11.67 2.48 
IQ: Verbal 89-139 110.81 13.45 
Performance 80-134 102.45 13.18 
Full Scale 82-138 108.67 12.52 





that of a normal curve (,? of 1.846 with 
5 df, 90 > p > .80). 

Despite the fact that the mean WAIS Full 
Scale IQ of this sample (108.67) is fairly 
close to the hypothetical population mean 
(100.00), the Matrices scores tend to cluster 
close to the ceiling. The interval between the 
mean and the highest possible score is only 
nine points and 63% of all scores lie in this 
interval. Chi-square analysis of the shape of 
the distribution showed significant (p < .05) 
departure from both normality and symmetry. 

Because of the effects of departures from 
symmetry on the correlation coefficient, one 
of the two possible correlation ratios (the re- 
gression of the WAIS variable on Matrices) 
was computed for each of the comparisons 
between Matrices and the 14 WAIS variables 
and tests for departure from linearity of re- 
gression were done (5, pp. 268-275). Using 
the .05 probability value as the level of sig- 
nificance, it was possible to retain the hy- 
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Table 2 


Correlation of Modified Matrices with Each of the 
WAIS Subtests and with the Verbal, Per- 
formance and Full Scale WAIS Scores 














Tests N Mean SD r 
Matrices 82 20.70 5.15 
WAIS: 

Information 82 19.42 4.63 .506 
Comprehension 82 20.10 4.17 .598 
Arithmetic 82 13.38 3.03 .452 
Similarities 77 15.92 3.76 All 
Digits 82 11.73 2.31 .282 
Vocabulary 82 52.02 16.55 .480 
Digit Symbol 82 50.50 12.77 .538 
Picture Completion 78 14.68 3.82 .602 
Block Design 82 33.43 8.72 642 
Picture Arrangement 67 26.24 5.98  .617 
Object Assembly 55 30.66 7.71 366 
Verbal Scale Score 82 71.28 13.75 .584 
Perform. Scale Score 82 51.54 10.04  .705 
Full Scale Score 82 122.82 20.92 .721 


Note.—The Matrices and WAIS subtest statistics are based 
on raw scores; those for Verbal, Performance, and Full Scale 
are based on summed scaled scores. In those instances in 
which the entire WAIS was not given (26 cases), the latter 
three scores were secured by prorating. 





pothesis of linear regression in all compari- 
sons except that between Matrices and Ob- 
ject Assembly. The correlation ratio expres- 
sing the regression of Object Assembly on 
Matrices is .545. Table 2 shows the means 
and standard deviations of the 15 test vari- 
ables and the correlation coefficients (rs) of 
Matrices with each of the 14 WAIS variables. 

Reliability. In this study a time limit was 


Table 3 


The Number of Individuals Passing Each of the 30 Items of the Modified Version of the 
Progressive Matrices (1938) 


(N = 82) 


















Set A Set B 








Set C 





Set D 













Item No. 
No.* passing 


Item No. 
No.* passing 





Item 
No.* passing 















No. Item No. 


No.* passing 








1 82 2 81 1 
3 82 4 78 3 
5 80 5 74 6 
7 74 8 56 8 
10 72 9 55 10 
11 53 12 36 li 











78 i 82 2 

75 4 70 3 53 
67 6 67 5 39 
60 8 53 7 10 
33 9 51 9 27 
36 11 23 12 4 









not attempt E-5, E-7, E-9, and E-11. 


* The item numbers are those of the standard version of Progressive Matrices (1938). 
Note.—One subject did not attempt item D-9; two did not attempt D-11; three did not attempt E-2 and E-3; and six did 
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imposed and six subjects did not attempt all 
the items. If the scores of these six subjects 
are excluded, the Kuder-Richardson reliabil- 
ity coefficient is .878; the reliability coeffi- 
cient for the total sample is .864. 

Item difficulty. The item data were ana- 
lyzed to secure information concerning the 
difficulty level of each item. Table 3 sum- 
marizes the analysis. 

It is apparent from the data in Table 3 
that there are displacements in item difficulty 
both between sets and within sets and that 
Set D as a whole is not more difficult than 
Set C. These two findings are not unique to 
the modified Matrices. Halstead (3) reports 
similar findings for the standard form of the 
test. 


Discussion 


The reliability coefficient of the modified 
Matrices is encouraging since it compares fa- 
vorably with the Kuder-Richardson coeffi- 
cient of .90 reported by Sinha (10) for the 
60-item form of the test and compares even 
more favorably with the reliability coefficients 
(Spearman-Brown, split half) of the various 
subtests of the WAIS Performance Scale (11, 
p. 13). Since the correlations with the WAIS 
indicate that Matrices has more in common 
with the Performance Scale than with the 
Verbal Scale, the hypothesis that Matrices 
can be useful in the evaluation of brain dam- 
aged individuals is given support. The low 
ceiling of the modified Matrices, however, is 
a severe restriction on its usefulness and ef- 
forts should be directed toward remedying 
this defect. The test ceiling could be raised 
by either of two methods: (a) by placing a 
greater premium upon speed of performance 
or (6) by increasing the difficulty range of 
the test. 

The first alternative has been utilized in 
the WAIS. Study of the raw score conversion 
table (11, p. 77) indicates that a major por- 
tion of the ceiling of the Performance Scale 
tests is derived from speed of performance. 
When one’s aim is to predict level of intel- 
lectual function in the daily activities of 
adults, however, ceiling secured by emphasiz- 
ing speed is of questionable value. There is a 
considerable body of evidence which indicates 
that, even in a physically intact adult popu- 
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lation, emphasis on speed has an adverse ef- 
fect upon the accuracy with which a score re- 
flects a point on a single dimension and the 
predictive usefulness of scores under speed 
conditions is attenuated. Guilford in review- 
ing empirical investigations of the effects of 
speed conditions on intelligence test perform- 
ance comments: “. . . speed conditions where 
items are not very easy open the door to 
many uncontrolled determiners of individual 
differences in scores” (2, p. 369). Analysis of 
the physical and psychological factors which 
can influence intelligence test performance 
suggests that in a patient population per- 
formance on speeded tests is particularly 
susceptible to the vitiating effects of irrele- 
vant situational variables. 

The considerations discussed above indi- 
cate that the second method of raising the 
ceiling of the modified Matrices, i.e., increas- 
ing its difficulty, is the method of choice. 
Progressive Matrices (1938) in its standard 
form does not have sufficient ceiling to dis- 
criminate among a superior group. Study of 
the item analysis done in this study and that 
of Halstead (3) suggests, however, that a 
selective method of item choice could yield a 
modification with adequate ceiling for popu- 
lation samples having mean IQs similar to 
that of the sample used in the study reported 
here. The item analyses indicate that a num- 
ber of the easy items should be eliminated 
and replaced with items at varying levels of 
difficulty, the selection of items being directed 
toward raising the ceiling and smoothing the 
ascent along the scale of difficulty. 


Summary 


The reliability, item difficulty, and correla- 
tion with the WAIS of a modified (30 item) 
form of Progressive Matrices (1938) was in- 
vestigated. The following findings are re- 
ported. 

1. The Kuder-Richardson reliability coeffi- 
cient for the modified version of Matrices is 
864 (N = 82). 

2. Correlation of modified Matrices with 
the WAIS Performance Scale score is .705; 
with the Verbal Scale score, .584; and with 
the Full Scale score, .721. The difference in 
the correlations with the Verbal and Perform- 
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ance Scales suggests that Matrices may be a 
useful complement to the Verbal Scale in 
evaluating the intellectual function of brain 
damaged individuals. 

3. A severe shortcoming of modified Ma- 
trices was its low ceiling. The score distribu- 
tion showed significant departure from both 
normality and symmetry. An analysis of item 
difficulty indicates that a reduction in the 
number of easy items and their replacement 
with items of greater difficulty probably would 
result in a modification having more adequate 
discriminative power. 


Received May 3, 1956. 
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The Use of Doppelt’s Short Form of the Wechsler 
Adult Intelligence Scale with Psychiatric Patients 


Tom D. Olin and Marvin Reznikoff 


The Institute of Living 


In a recent paper, Doppelt (1) proposed 
an abbreviation of the Wechsler Adult In- 
telligence Scale (WAIS) which would pro- 
vide the examiner with an adequate estimate 
of the subject’s IQ without giving all eleven 
subtests of the scale. His short form is com- 
posed of four subtests: Arithmetic, Vocabu- 
lary, Block Designs, and Picture Arrange- 
ment. Using the population in the national 
standardization of the WAIS (2), he ob- 
tained a correlation coefficient between the 
sum of these four subtests and the Full Scale 
score of approximately .96. Regression equa- 
tions were computed for various age groups 
for use in predicting the Full Scale score 
from the sum of the four subtests. The stand- 
ard error of estimate was determined to be 
approximately 7 scaled score points (4.2 IQ 
points). To check his predictive equations, 
Doppelt applied them to two groups of sub- 
jects not used in his statistical analysis and 
found that in 71% of his cases the differences 
between his obtained and estimated Full 
Scale scores were within one standard error 
(+7); two standard errors (+14) con- 
tained 96% of the cases. 

Since Doppelt’s procedure appears to per- 
mit an adequate estimate of the IQ in a rela- 
tively brief period of time, it is potentially of 
considerable usefulness in many situations. 
The question arises, however, whether such 
an abbreviated procedure, derived from data 
obtained from a presumably normal popula- 
tion, can be employed reliably in an emo- 
tionally disturbed population. There is con- 
siderable evidence, for instance, that schizo- 
phrenic patients show significantly greater 
subtest scatter than do normals (3). The 
purpose of this study, therefore, is to deter- 


mine whether the Doppelt short form of the 
WAIS would provide adequate estimates of 
the Full Scale scores in a disturbed popula- 
tion. 


Procedure and Results 


The subjects included in this study con- 
sisted of all of the patients at a private psy- 
chiatric hospital who had been given the full 
WAIS as part of a routine psychological ex- 
amination. Six patients having final diagno- 
ses of a primarily organic nature were elimi- 
nated from the group, however, in order to 
limit the subject population to functional 
disorders. The remaining subject group con- 
sisted of 107 patients having varied schizo- 
phrenic and neurotic diagnoses and manifest- 
ing a wide range of behavioral deviations. 
Forty-four per cent of the patients were men, 
and 56% were women. The mean age was 
36.5 years and the range was 16 to 69 years. 
WAIS IQs ranged from 78 to 135 with a 
mean of 108. Approximately 78% of the 
cases fell above an IQ of 100. 

The four weighted scores were summed and 
the Full Scale scores were estimated using 
the simplified regression equations in Table 4 
of Doppelt’s article. A correlation of .925 was 
found between the obtained Full Scale score 
and the sum of the four subtests; the stand- 
ard error of estimate was computed to be 7.9 
points. These results are in good agreement 
with Doppelt’s correlation of .96 and stand- 
ard error of estimate of 7 points. It is to be 
noted that the standard deviation of the ob- 
tained Full Scale scores of this disturbed 
group was 20.79 compared with Doppelt’s 
figure of 25. This lower standard deviation 
partially compensates for the lower correla- 
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tion in the computation of the standard error 
of estimate. 

Table 1 presents the distribution of differ- 
ences between the obtained and predicted 
Full Scale scores for the 107 patients in- 
cluded in this study. The distribution closely 
approximates Doppelt’s findings; 70.1% of 
the cases fall within Doppelt’s standard error 
(7 points) as compared with 71% of Dop- 
pelt’s samples. The higher mean deviation of 
+ 2.5 as compared with Doppelt’s values of 
— 0.1 and — 0.5 suggests that Doppelt’s re- 
gression equations tended to underpredict the 
IQs in this emotionally disturbed group. As 
can be seen from Table 1, approximately 
twice as many cases were underpredicted as 
were overpredicted. Accuracy of prediction 
might be improved for the disturbed group 
by utilizing regression equations computed 
specifically for such a population. 

Table 2 presents the differences in actual 
IQ points between the obtained and predicted 
Full Scale IQs. Seventy-two per cent of esti- 
mated IQs deviate 4 points or less from the 
obtained IQs. In 9.3% of the cases there 
was perfect prediction. 


Table 1 


Distribution of Differences Between Obtained Full Scale 
Scores and Predicted Full Scale Scores * 











Difference Frequency Percentage 
+15 to +21 4 3.7 
+ 8to +14 22 20.6 
+ 1ito+7 46 43.0 

0 4 3.7 
— 7to— 1 25 23.4 
—14to— 8 5 4.7 
—21 to —15 1 9 





* Obtained Full Scale score minus the predicted Full Scale 


score. 


Table 2 


Distribution of Differences Between Obtained Full Scale 
IQ and Predicted Full Scale IQ * 











Difference Frequency Percentage 
+13 to +16 1 0.9 
+ 9to +12 2 1.7 
+ Sto+ 8 20 18.7 
+ 1lto+ 4 42 39.3 
0 10 9.3 
— 4to— 1 25 23.4 
— 8to— 5 5 4.7 
—12to— 9 2 1.7 





* Obtained Full Scale IQ minus the predicted Full Scale IQ. 


Summary 


An attempt was made to check the ac- 
curacy of Full Scale score prediction using 
the WAIS short form as proposed by Dop- 
pelt with a psychiatrically disturbed popula- 
tion. A correlation of .925 was obtained be- 
tween the sum of the four subtests and the 
obtained Full Scale score. The standard error 
of estimate was computed to be 7.9 scaled 
score points. The results suggest that the 
Doppelt short form yields reasonably accu- 
rate prediction of IQs in a disturbed popu- 
lation. 
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Performance of Children on the Davis-Eells Games 
and Other Measures of Ability 


Mary I. Love and Sylvia Beach 


Cincinnati Public Schools 


Social scientists who work with children 
from a wide variety of backgrounds have long 
felt a need for tests in which verbal material 
and specific cultural factors would be of mini- 
mal importance. Existing group tests of in- 
telligence, with their frequent emphasis on 
vocabulary and reading, have appeared to im- 
pose a particular hardship on children with 
reading disabilities or those from groups 
where verbal communication is limited by 
isolated or unusual geographical or cultural 
settings. 

Several “culture-fair” tests have been de- 
veloped to try to circumvent these difficulties. 
The recently published Davis-Eells Test of 
General Intelligence or Problem-Solving Abil- 
ity (1) is one devised by a sociologist and an 
educational psychologist. 


The test is presented in such a form as to require 
the pupil to understand and respond to a variety of 
verbal material, but is entirely free of any reading 
requirements. . . . The situations represented by the 
items deal with kinds of problems such that chil- 
dren from different kinds of backgrounds will have 
more nearly equal opportunity for familiarity with 
the necessary experiences. ... The verbal material 
used in administering the test has been carefully 
screened ...to eliminate words (or grammatical 
constructions) which would be more familiar in 
some groups than others (1, p. iv). 


Method 


Children in the 4th grades of the Cincinnati 
Public Schools are regularly given one of the 
traditional group tests of intelligence. In the 
school year 1953-54 they received Series D of 
the Sixth Edition of the Kuhlmann-Anderson 
Test (2), scored by the 1942 revised norms, 
administered by examiners from the school 
system’s Appraisal Service. For purposes of 


comparison, 4th grade children from eight 
schools were given the Davis-Eells Games, 
Elementary Form A, administered by a school 
psychologist. The Kuhlmann-Anderson Tests 
were given in the fall, the Davis-Eells Games 
from five to eight months later. 

The schools selected for the present study 
included four with populations predominantly 
from a lower socioeconomic level, two from 
an upper socioeconomic level, and two more 
nearly representative of a middle group. Cin- 
cinnati’s topography lends itself to defining 
such levels almost geographically, with peo- 
ple of the lower economic levels living in 
Basin, or River-Bottom area, representatives 
of the middle group living on the first hills, 
and members of the higher group living far- 
ther out on the hills. There are, of course, 
exceptions, but the predominant level in each 
school district could thus be categorized. A 
total of 469 children received both tests. At 
the time they were given the Kuhlmann-An- 
derson Test the children ranged in age from 
8-6 to 12-4, with a median age of 9-7. The 
children were given the Davis-Eells games 
when their age range was 8-11 to 12-10, 
with a median age of 10-1. Table 1 shows 
the distribution according to socioeconomic 
level, race, and median age when tested. 

In the fall of 1954, 341 of the same chil- 
dren were given the California Reading Test, 
Elementary-BB (5). Although the number of 
children was smaller, the representative per- 
centage of the groups was not greatly differ- 
ent with 33% of children from the upper 
level, 27% from the middle, and 40% from 
the lower. 

As a subsidiary study, 110 3rd grade chil- 
dren in one of the predominantly lower-class 
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Table 1 


Distribution of Children Given Kuhlmann-Anderson and Davis-Eells Tests 
According to Age, Race, and Socioeconomic Level 








Socio- Number Number Percent Median 
economic of of of age for 
level white Negro Total total K-A 





Upper 137 137 29 9-5 
Middle 56 62 118 25 9-7 
Lower 97 117 214 46 9-9 


Total 290 179 469 





schools were given the Davis-Eells Games and 
the California Test of Mental Maturity (4) 
when their median chronological age was 9-0. 
All of these children were Negroes. 


Results 


Kuhlmann-Anderson scores are translated 
into mental age units, and thence into intelli- 
gence quotients. The authors of the Davis- 
Eells Test recommend that the measure de- 
rived from their test be called an “Index of 
Problem Solving Ability,” or IPSA, but add, 
“Since the Index of Problem Solving Ability 
is computed in the same way that many test 
makers compute an intelligence quotient, 
those users of the Davis-Eells Test who pre- 
fer the term ‘IQ’ may quite appropriately ap- 
ply this term to what the authors prefer to 
call the Index of Problem Solving Ability” 
(1, p. 29). In this paper the term IQ is used 
to refer to the performance on either test un- 
der consideration. 

The mean Kuhlmann-Anderson IQ of the 
469 children studied was 100.7 with a stand- 
ard deviation of 16.05. The median was 100.3. 
On the Davis-Eells the mean IQ was 90.1, 


Table 2 


Distribution of Means on the Davis-Eells and 
Kuhlmann-Anderson Tests 


with a standard deviation of 15.96. The me- 
dian was 88.2. The correlation between the 
tests was .53, significant beyond the 1% level. 
Table 2 shows the distribution of means ac- 
cording to socioeconomic level. 

Table 2 indicates that the children in the 
present study consistently rated higher on the 
Kuhlmann-Anderson than on the Davis-Eells. 
This trend was noted at all levels, with the 
difference being greater at the upper level 
(both IQ and socioeconomic) than at the 
lower. Differences between all pairs of means 
were statistically significant with a ¢ beyond 
the 1% level in every instance. That is, dif- 
ferences between the means of different socio- 
economic groups on the same test were sig- 
nificant, as were differences between the 
means on the two tests. 

The correlation of the reading tests scores 
with the IQs appears in Table 3. 

Children also rated consistently lower on 


the Davis-Eells than on the California Test — 


of Mental Maturity as shown in Table 4. In 
this case also the differences between the 
means were significant. 


Table 3 


Correlations Between Scores on the California Reading 
Test (Elementary-BB) and IQs on the Davis- 
Eells and Kuhlmann-Anderson 











Socio- Kuhlmann-Anderson 
economic 


Davis-Eells 





level Mean SD Mean SD 


Socio- 
economic Kuhlmann- 
level Anderson 





Upper 115.9 

Middle 99.9 13.2 88.3 

Lower 91.8 12.9 84.2 
All cases 100.7 90.1 


Upper .66 
Middle 69 
Lower .69 

All cases 80 
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Table 4 


Means and Correlations of the Davis-Eells and the 
California Test of Mental Maturity 








Test Mean 





Davis-Eells 

California Test of Mental Maturity 
Language Factors 
Nonlanguage Factors 
Total 


82.5 


87.2 
91.9 





Discussion 


Determining the nature or direction of 
characteristic variations in performance on 
the Davis-Eells and the traditional tests was 
the principal purpose of this study. The ra- 
tionale of the Davis-Eells, as expressed by its 
authors, implies that existing tests of intelli- 
gence tend to penalize some groups while the 
Davis-Eells cuts across verbal and cultural 
factors. The uniform tendency for ratings on 
the Davis-Eells to be lower than the ratings 
on the other tests does not confirm this hy- 
pothesis. The hypothesis was also somewhat 
challenged by the fact that the differences 
between the mean performances of children of 
lower, middle, and upper socioeconomic levels 
were statistically significant on the Davis- 
Eells, even as on the Kuhlmann-Anderson. 

One possible explanation of the lower rat- 
ings on the Davis-Eells is that the stand- 
ardization group was more highly selected 
than the test’s authors intended. Another pos- 
sibility would seem to lie in the physical 
structure of the test. The pictures are care- 
fully drawn with many minute details which 
might tend to handicap children with uncor- 
rected visual difficulties, of whom there are 
doubtless more among lower socioeconomic 
levels than among higher. The two test ques- 
tions which an item analysis showed to be 
most frequently missed by 4th graders were 
two hinging on particularly small pictorial 
details (numbers 19 and 10). 

It would also seem that the necessity for 
constant attentiveness to spoken instructions 
may impose a hardship on some children. 
However, none of the highly verbal “money” 
problems rates among the 7 most difficult, 
although there were 39 “no answers” on 


these 8 problems, as opposed to 30 “no an- 
swers” on the 54 remaining test items. 

Another possible explanation for the lower 
ratings on the Davis-Eells might lie in the na- 
ture of the test material. While one of the 
merits of the test is its use of familiar prob- 
lem situations, one wonders whether this 
might not also be a source of weakness, in 
that some of the depicted episodes may be so 
realistic as to carry a high emotional content 
for the child and create something of a shock 
situation which would handicap his function- 
ing. Further work in this area might be profit- 
able. Studies at the Wayne County Training 
School have suggested that “the emotional 
loading of some of these pictures and the 
ambiguity of others would stimulate the ex- 
pression of our subjects’ needs, thus interfer- 
ing with the required intellectual activity” 
(3, p. 497). 

The correlations of the Kuhlmann-Anderson 
and the Davis-Eells with the California Read- 
ing test were interesting. There was an un- 
usually high correlation between the Kuhl- 
mann-Anderson and the reading test for all 
groups combined, and a moderately high cor- 
relation for the separate groups. The Davis- 
Eells correlation with reading achievement 
was in every instance lower, and with the 
lowest socioeconomic group it was extremely 
low, suggesting that the test may measure 
mental ability independently of reading abil- 
ity for that group where stimulation toward 
reading achievement at home is not so gen- 
erally a part of the culture. For the middle 
and upper groups, which may be assumed to 
have more cultural pressure toward reading, 
success on the Davis-Eells is about as much 
related to reading achievement as success on 
the more verbal Kuhlmann-Anderson. 


Summary 


The Davis-Eells Games were administered 
to 579 third and fourth grade children, 469 
of whom were given the Kuhlmann-Anderson 
Tests, and 110 of whom were given the Cali- 
fornia Test of Mental Maturity. The mean 
scores on the Davis-Eells Games were signifi- 
cantly lower than the mean scores on either 
of the traditional-type tests of intelligence. 

The California Reading Test was also given 
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to 341 children receiving both the Davis- 
Eells and the Kuhlmann-Anderson, and the 
results were correlated. There was a high 
positive correlation between reading achieve- 
ment and performance on the Kuhlmann-An- 
derson for all socioeconomic groups, and a 
relatively high correlation between reading 
scores and Davis-Eells performance for mid- 
dle and upper socioeconomic groups. With the 
lowest socioeconomic group success on the 
Davis-Eells was only slightly related to read- 
ing ability, suggesting that for children of 
this category the test is divorced from read- 
ing achievement, if not from other cultural 
determinants. 


Received May 21, 1956. 
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Prediction from the Cattell Infant Intelligence Scale 


Maxine C. Cavanaugh, Ira Cohen, Donal Dunphy, Egan A. Ringwall 
University of Buffalo 


and Irving D. Goldberg 
New York State Department of Health 


A number of investigators have shown that 
various infant psychometric tests have poor 
prognostic value (1, 2, 3, 7). Cattell reported 
some encouraging ‘data derived from her 
standardization of the Cattell Infant Intelli- 
gence Scale (CIIS) particularly with respect 
to infants eighteen months of age and older 
children. She cautioned against using scores 
from very young infants for predictive pur- 
poses. Gallagher (5) reports a high correla- 
tion between nine-month and sixteen-month 
CIIS scores on the same infants. The wide- 
spread use of the CIIS and the paucity of 
data on its predictive efficiency call for fur- 
ther research. 

The primary purpose of this report is to 
present some findings related to the predictive 
value of the CIIS when administered to six 
month-old infants. These findings were de- 
rived from a longitudinal study being con- 
ducted at Children’s Hospital, Buffalo, New 
York. 


Description of the Child Growth Study 


The plan and method were described in de- 
tail in a previous report (8). Briefly, the 
study was planned to evaluate the effects of 
trauma to the central nervous system, anoxia, 
and other possible influences on gestation, de- 
livery, the neonatal period, and the pre-school 
child. 


The Study subjects were infants born at Children’s 
Hospital, Buffalo, New York within the period from 


The Child Growth Study from which these data 
were derived was developed through the cooperative 
efforts of the Department of Pediatrics and Obstet- 
rics of the University of Buffalo School of Medicine 
and the New York State Department of Health. 


September 1949 to December 1953. Three children 
were born in other hospitals within the city. Ini- 
tially, there was no preferential selection of cases 
for inclusion in the Study, except that for con- 
venience the majority of children included were born 
during the daytime hours. Since one purpose of the 
study was concerned with neurological factors, some 
preferential selection was given subsequently to chil- 
dren born by Caesarean section, induction, labors of 
long duration, and other potential or definite varia- 
tions from a normal delivery. 

All children observed received a physical examina- 
tion and when possible an _ electroencephalogram 
shortly after birth. The general course of behavior 
during the neonatal period was noted and follow-up 
visits were scheduled at 6, 12, 18, 24, 36, 48 and 60 
months of age. At each visit the child was sched- 
uled to receive a physical and neurological exami- 
nation, an electroencephalogram, and an intelligence 
test, except at 18 months of age when only the psy- 
chological examination was routinely scheduled. In 
addition to intelligence testing, the psychologist spent 
a portion of the visit obtaining data from the parents 
concerning family life, the child’s behavior and ad- 
justment. 

The CIIS was administered to children from 6 
through 24 months of age and the Revised Stanford 
Binet Form L (SB) thereafter. Over the period of 
the study, four psychologists have been employed at 
different times. At present all scheduled examinations 
through 18 months of age are completed, and some 
children admitted early in the study have been ob- 
served for five years. 

The great majority of children were tested within 
one month of the age designated. The six-month 
group included children seen between 5 and 7 months 
of age and a few at 8 months of age; the 12- through 
36-month groups did not deviate by more than two 
months from the planned date of visit (except for 
one child aged 27 months who was included in the 
24-month group); only for the 48- and 60-month 
examinations did the groups contain children who 
deviated more than two months. 

In scoring the CIIS the CA was calculated to the 
nearest tenth of a month from date of birth. Dis- 
tinction was made for this report between “valid” 
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and “questionably valid” tests according to the ex- 
aminer’s judgment. A test was termed questionably 
valid when the examiner felt a child was not doing 
his best for reasons of irritableness, fatigue, lack of 
cooperation, etc. Except where otherwise noted, only 
tests rated as “valid” were used in the analysis of 
our data. The study children are somewhat above 
the mean intellectually. This is shown by Fig. 1, 
which presents the distribution of Binet IQs ob- 
tained from the first 50 children who had com- 
pleted the five-year examination. 


In studying the effects of gestational ma- 
turity on the CIIS infants were classified as 
premature if their gestation was less than 42 
weeks and their birthweight was 5 pounds 8 
ounces or less. Mature infants were defined as 
weighing more than 5 pounds 8 ounces and a 
gestational age of less than 42 weeks. The re- 
mainder, those with a gestational age of 42 
weeks or over, comprised the postmature 
group. 

Results 


The mean IQ obtained from all children 
who had a “valid” psychological examination 
at 6 months of age and/or 12, 18, 36 months 
of age is shown in Table 1 according to ma- 
turity classification. The mean IQ of prema- 
tures was a significantly lower (p < .001) 
than that of the matures or postmatures at 6 
months of age. Further, the mean IQ of the 
postmatures was significantly higher (p = 
.04) than that of the matures. 

The effects of prematurity on IQ were still 
highly significant at 12 months of age though 
somewhat diminished. The mean IQ scores for 
matures and postmatures were identical at 12 
months of age. In the 18- and 36-months ex- 
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Fig. 1. Distribution of Stanford-Binet five-year 
IQs for 50 children with completed examinations. 
Mean IQ = 117.2, SD = 12.7. 


Table 1 








Maturity 





Pre- 
mature 


Post- 


Mature mature 





6 months 
Number 40 252 41 
Mean IQ (Cattell) 91.2 106.4 109.6 
SD 14.6 8.9 9.2 
12 months 
Number 29 190 34 
Mean IQ (Cattell) 94.4 102.2 102.4 
SD 7.4 8.6 9.6 
18 months 
Number 24 138 20 
Mean IQ (Cattell) 102.4 105.3 103.2 
SD 8.3 8.4 6.3 
36 months 
Number 11 56 11 
Mean IQ (Binet) 117.1 119.6 117.2 
SD 11.5 13.9 9.5 


aminations, however, between-group differ- 
ences were not significant. The increment in 
IQ for prematures after 6 months is not due 
to any bias of selection, since this finding was 
upheld when the same children are studied at 
successive examinations. While the premature 
tended to score lower than the full-term in- 
fant at 6 months, this is no longer true at 18 
months of age. 

To obtain some indication of the stability 
of the IQ at 6 months, those children from 
whom a “valid” test was obtained at both 
6 and 36 months were studied. The scores 
achieved at these two age levels were sub- 
divided into three broad groups: under 91, 
91 through 116, and 117 or over (approxi- 
mating Terman’s classification). This division 
provides grouping of below-average, average, 
and above-average scores. A contingency table 
was prepared using these classifications to de- 
termine the extent to which children with low, 
average, and high scores at 6 months of age 
remained in the same category at 3 years of 
age. There were 57 children (excluding seven 
prematures) who, had “valid” scores on both 
the 6- and 36-month tests. The distribution 
of scores is given in Table 2. 

The discrepancy between the 6-month and 
36-month scores is clear. The 6-month score 
is generally lower than that which a given 
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Table 2 
Change in IQs at 6 and 36 Months of Age 








36-Month IQ (Stanford-Binet) 





6-Month IQ Under 
(Cattell) 91 


117 and 


91-116 over Total 


Under 91 a 2 2 
91-116 1 14 40 
117 and over -- 6 15 





Total 1 22 7 57 





child attained at 36 months. Neither of the 
two children with an IQ under 91 at 6 months 
remained in that category at three years of 
age, and both of these children had achieved 
a score of more than 110 at the later age. 
Likewise, more than 60 per cent of those 
with an “average” IQ initially had scored 
“high” at 36 months. Further analysis of 
these data indicates that this shift in scores 
was not confined to IQs around the group 
divisions, but rather was widespread through- 
out the entire range of scores. 

When the distribution of scores, excluding 
prematures, was divided into thirds at 6 
months and at 36 months, it was found that 
only 40 per cent of the children remained in 
the same third of the distribution. For ex- 
ample, of the 19 cases comprising the upper 
third at 6 months only six were included in 
the upper third at 36 months, with eight of 
the remaining thirteen lying in the lowest 
third at 36 months. 

In addition, the Pearsonian r was calculated 
for all possible age combinations of the 
“valid” IQ scores between 6 and 48 months. 
Because of the extreme effect of prematurity 
on 6- and 12-month scores, children born pre- 
maturely were excluded from consideration of 
all correlations with the 6- and 12-month 
IQs. The correlation coefficients are presented 
in Table 3. 

From the table it is clear that the correla- 
tion of the 6-month IQ with that at later ages 
is low. In fact, only in correlation of the 6- 
with the 12-month IQ was the coefficient sta- 
tistically different from zero. The reason for 
the significant r in the 6- by 12-month cor- 
relation may be due to the similarity of the 
test items at these age levels. In contrast, the 


correlation coefficient at every other age save 
one (12-24 months) was significant at the 
.01 level. The absence of meaningful correla- 
tion of the 6-month IQ with those at the 
subsequent ages demonstrates the inadequacy 
of the 6-month IQ as a predictor of subse- 
quent test scores. 

In an evaluation of the “trend” in these 
correlation coefficients, consideration must be 
given to possible bias in the selection of cases. 
All tests were completed for a limited number 
of children; hence, the difference in numbers 
available for study at various ages. Thus, the 
68 children used in the comparison of the 36- 
and 48-month IQ are those born at the begin- 
ning of the study, while the 191 children com- 
pared at 6 and 12 montls are those drawn 
from the entire study population. 

Excluding the prematures, there were 16 
children who received all psychological tests 
from 6 through 60 months of age and who 
were considered to have given a “valid” score 
at each test. An additional group of 18 chil- 
dren had completed all tests, but for each 
child in this group at least one of the seven 
tests was considered to be of only “question- 
able validity.” Although the difference be- 
tween these two groups in the mean IQs at 


Table 3 


IQ Correlation Coefficients 


Ages compared 
(months) 





6 X 12 
6X 18 
6 X 24 
6 X 36 
6 X 48 


12 X 18 
12 X 24 
12 X 36 
12 X 48 


18 X 24 
18 X 36 
18 X 48 


24 X 36 
24 X 48 


36 X 48 





** Significant at the .01 level. 





Table 4 


Correlation Coefficients of IQs for 34 Children Who 
Completed All Tests from 6 Through 60 Months 











Ages Ages 
compared compared 
(months) r (months) r 
6X 12 32 18 X 24 A8** 
6X 18 32 18 X 36 21 
6 X 24 .08 18 X 48 .23 
6 X 36 05 18 X 60 19 
6 X 48 .26 
6 X 60 21 24 X 36 A6** 
24 X 48 .38* 
12 X 18 36* 24 X 60 .24 
12 X 24 —.01 
12 X 36 - 36 X 48 A ae 
12 X 48 soe 36 X 60 Sr 
12 X 60 37* 
48 X 60 .69** 





* Significant at .0S level. 
** Significant at .01 level. 


each age was not statistically significant, gen- 
erally the “questionably valid” scores ap- 
peared to be slightly lower than the scores 
considered to be “valid.” Table 4 presents 
the correlation coefficients (similar to those 
of Table 3 with the addition of the 60-month 
correlations) based on the combined group of 
34 children. Although some slight error may 
be present due to the inclusion of the “ques- 
tionably valid” scores, it is probable that this 
would be offset by increase in size of the 
combined group. 

The patterns observed in Table 3 are mani- 
fest in Table 4. Starting with the 18-month 
IQs, the greater the intertest interval the 
lower the correlation. As was also evident in 
Table 3, an appreciable correlation is first 
noted when the 36-month IQ is compared 
with the score at a subsequent age. This may 
be related to the fact that the SB was used 
beginning with age 3. The lack of significance 
of coefficients in Table 4, which are signifi- 
cant in Table 3 is probably due to the smaller 
numbers involved. However, Table 4 con- 
firms the previously noted absence of signifi- 
cant correlation of 6-month IQs with those at 
later ages. 

An analysis of variance of the IQ for the 34 
children who had completed all tests through 
5 years of age, disclosed no difference among 
the mean scores (SB) at 36, 48, and 60 


Cavanaugh, Cohen, Dunphy, Ringwall, and Goldberg 


months of age. Likewise, no differences were 
found among the mean scores obtained from 
the CIIS at 12, 18, and 24 months of age. 
However, the over-all mean of the SB scores 
(36-60 months) was higher and statistically 
different at the .01 level from the mean of the 
CIIS scores (12-24 months). It would appear 
that those functions tapped by the infant 
scale are different from and not directly pre- 
dictive of the functions measured by the SB. 
Although it has been stated that the CIIS 
(4) is a downward extension of the SB, the 
analysis of our data does not bear this out. 
Further, the mean score at 6 months was sig- 
nificantly different (p > .01) from the mean 
scores 12, 18, and 24 months. This supports 
the conclusion already noted regarding the in- 
adequacy of the 6-month IQ as a predictive 
index of later test scores. 

The pattern of the mean IQ at each age for 
the 34 children with all tests completed is 
depicted in Fig. 2. The similarity of mean 
scores between 12 and 24 months, the simi- 
larity of mean scores between 36 and 60 
months, the difference between these two 
groups, and the uniqueness of the 6-month 
IQ are all readily apparent. The fact that the 
upper 99 per cent confidence limit at six 
months overlaps lower limits at 36, 48, and 
60 months should not be construed to imply 
that the differences between the 6-month 
score and those at later ages are not signifi- 
cant statistically. The appropriate test, which 
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Fig. 2. Mean JQs for 34 children who completed all 
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children are involved throughout, does dem- 
onstrate significance. 


Discussion 


The findings reported above confirm the 
conclusions of some previous investigators 
that the intelligence scores of children below 
two years of age are of little value in predic- 
tion of subsequent IQ scores. The correlations 
obtained in the present study are consistently 
lower than those reported by Cattell at the 
same age levels. Despite the relatively high 
correlation obtained at the most proximal age 
comparison (24 by 36 months), Fig. 2 dem- 
onstrates the wide discrepancy between the 
mean IQs at these two ages. Thus, it is clear 
that a relatively high correlation coefficient 
between these two tests does not permit a 
direct prediction of SB scores from earlier 
Cattell scores. 

The hypothesis has been advanced by sev- 
eral investigators that the present validity of 
infant testing lies in its differentiation of the 
extremes in a population (4, 7). Because of 
the absence of extremely low scores in our 
sample, we were unable to test this hypothe- 
sis. It is possible that the Cattell provides 
some measure of maturational level, but it 
does not appear to be directly predictive of 
those intellectual functions measured at later 
ages by the SB. 

Several aspects of this problem bear fur- 
ther investigation. It is possible that an item 
analysis may produce a selection of items 
which will have better predictive value than 
those which are now in use. Also, an analysis 
of individual growth patterns might help to 
determine whether or not intellectual matura- 
tion is a curvilinear function. The data pre- 
sented in this report- will again be analyzed 
when all examinations have been completed. 


Summary 


A relationship was found to exist between 
IQ and fetal maturity among children with a 
CA of approximately 12 months. However, 
the effect of prematurity on test scores com- 
pletely disappeared by 18 months of age. 
The CIIS was employed at the age of 6, 


Prediction from the Cattell Infant Intelligence Scale 
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12, 18, and 24 months, and the Revised Stan- 
ford-Binet (Form L) at 3, 4, and 5 years. 
An analysis of variance disclosed no differ- 
ences among the mean scores obtained from 
the Cattell at 12, 18, and 24 months, nor 
among the SB mean scores at 36, 48, and 60 
months. However, the differences in the scores 
at 6 months when compared with the scores 
at 12, 18, and 24 months were significant. 
Further, the over-all mean of the SB scores 
was higher and statistically different from the 
over-all Cattell mean score. This discrepancy 
may be due to the differences in the structure 
of the two tests. 

In addition to a study of individual scores, 
the data were examined by means of correla- 
tion coefficients and an analysis of variance. 
The results of these analyses showed that the 
CIIS at the age of 6 months was a poor pre- 
dictive index of intelligence. This supports the 
findings of other investigators who used dif- 
ferent psychometric instruments and empha- 
sizes the limitations involved in using the re- 
sults of the 6-month test on an individual 
basis. 


Received May 21, 1956. 
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Manifest Anxiety, Intelligence, and Psychopathology’ 


Richard H. Dana 


University of Wisconsin-Milwaukee 


Recent studies on the relationship between 
various measures of intelligence and the Tay- 
lor Manifest Anxiety Scale (A Scale) have 
yielded contradictory results. Calvin e¢ al. 
(3), Kerrick (6), Matarazzo et al. (8), and 
Grice (5) reported a slight but significant 
negative correlation. Schulz and Calvin (11), 
Farber and Spence (4), and Mayzner e¢ al. 
(10) found no relationship whatever. Taylor 
(12) presented the homogeneity of intelli- 
gence among college students as one possible 
explanation; Klugh and Bendig (7) sug- 
gested that sampling fluctuation is respon- 
sible; Matarazzo et al. (9) were concerned 
with the criterion of intelligence and empha- 
sized that the strength of the relationship is 
only moderate. 

At least three separate but related and un- 
controlled variables emerge from this history: 
(a) the criterion: various measures of intelli- 
gence have been employed; (5) the sample: 
relatively homogeneous Ss have been used 
(college students and selected service per- 
sonnel); (c) psychopathology: presence or 
absence and degree. The present study is an 
attempt to control heterogeneity of intelli- 
gence and presence of psychopathology. 


Method 


Subjects. The Ss were 100 psychiatric aides 
and 100 outpatients who had taken the Wech- 
sler-Bellevue, Form I, and the MMPI dur- 
ing the same test administration. Sex dis- 
tribution was similar with 56 women aides 
and 57 women outpatients. The Ss were 
drawn in alphabetical order from hospital 
files. Background information (sex, age, edu- 
cation) for both groups was available. 


1 This study was conducted at the St. Louis State 
Hospital, St. Louis, Missouri. 
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Procedure. The MMPI answer sheets 
(group administration) were hand-scored in- 
dependently by two clerks for the A scale 
and the Winne scale. The Winne Scale of 
Neuroticism, which is derived from the 
MMPI Neurotic Triad (Hy, D, Hs), was 
used to provide an independent measure of 
psychopathology. The A-scale items included 
in the MMPI were used because of evidence 
that equivalent results are obtained with 
both A-scale forms, MMPI and Biographical 
Inventory (1, 2). 

The two groups were compared statistically 
on background variables, intelligence, and 
MMPI scores. Correlations were obtained be- 
tween A-scale scores, intelligence, and back- 
ground variables. 


Results 


Equivalence of groups. There were no sig- 
nificant differences between the two groups 
on Wechsler scores and number of years of 
education (Table 1). However, the difference 
between mean ages was significant (p < .05). 
The two groups are thus similar in intelli- 
gence and education. They appear to approxi- 
mate the general, noncollege population with 
respect to these variables. 

Presence of psychopathology. The mean 
Winne scale score for the outpatient group 
exceeds the cutoff score of 11 which correctly 
identifies approximately two-thirds of neu- 
rotic Ss (14). The differences between aide 
and outpatient A-scale and Winne scores were 
significant at the < .001 level. The two 
groups thus represent different degrees of 
psychopathology and could be labeled as 
“normal” and “neurotic.” 

A-scale scores and background variables. 
The relationships between background vari- 
ables and A-scale scores were not significant 
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Table 1 






Comparison of Variables for Aides and Outpatients 








Aides 


Variable 








Mean SD 





Outpatients 


Mean SD t p 











Wechsler-Bellevue 


Verbal 97.4 15.7 
Performance 101.8 14.4 
Full Scale 100.2 14.6 
Age, years 34.9 12.5 
Education, grade 9.9 1.8 
Winne Scale 3.6 1.9 
A Scale 9.1 6.0 









96.9 11.2 0.3 -- 
100.0 13.5 0.9 — 
98.4 12.1 1.0 — 
31.0 9.7 2.4 <.05 
10.2 2.0 0.3 —- 
13.1 5.9 16.0 <.001 


23.2 8.6 14.0 <.001 





for either group (Table 2). The Winne and 
A-scale scores were significantly related as 
might be expected from two measures of 
“anxiety” and the degree of relatedness was 
comparable to past research (13). Six MMPI 
items occur in both scales. 

A-scale scores and intelligence. Zero-order 
correlations were obtained between Wechsler 
and A-scale scores for the aide group. The 
correlations for the outpatient group were 
negative but not significant (Table 2). 


Discussion 


This study is essentially an example of con- 
trol applied to variables which may con- 
tribute to any relationships obtained between 
A-scale scores and intelligence. Heterogeneity 
of intelligence and degree of psychopathology 
were controlled by means of sampling pro- 
cedures and size and source of sample. 

The results suggest that intelligence and 


Table 2 


Pearson r’s Between A Scale and Other Variables 
for Aides and Outpatients 

















Group 
Variable Aides Outpatients 

Wechsler-Bellevue 

Verbal — 01 — 02 

Performance — .02 —.15 

Full Scale 00 —.17 
Age, years 5 — .02 
Education, grade —.15 — 01 
Winne Scale 63 64 





Note.—An r of .195 is significant at the .05 level of con- 
fidence. 






Manifest Anxiety are not significantly related 
when measured by the Wechsler-Bellevue, 
Form I, and the MMPI A-scale items, re- 
spectively. The presence of psychopathology 
may contribute to the trend toward a slight 
significant but negative relationship between 
anxiety ana intelligence. It should also be 
noted that differences in age and education 
may contribute to the obtained correlations. 


Summary 


The relationship between Manifest Anxiety 
and intelligence was evaluated by means of 
a design which attempted to control such 
variables as heterogeneity of intelligence and 
presence of psychopathology. The Ss, 100 
“normal” and 100 “neurotic,” were similar in 
age and education. They were approximately 
normally distributed with respect to intelli- 
gence test scores. The results suggest that 
considerable caution must be exercised in in- 
terpreting any relationship between intelli- 
gence and Manifest Anxiety. Although no sig- 
nificant relationship was demonstrated, the 
present statistical results illustrate that faulty 
control of relevant variables may have con- 
tributed to some of the apparent significance 
of past research. 


Received May 6, 1956. 
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The Inhibition Process, Rorschach Human Movement 
Responses, and Intelligence** 


Murray Levine,’ Harvey Glass 
Veterans Administration Regional Office, Philadelphia, Pennsylvania 


and Julian Meltzoff 


Veterans Administration Regional Office, Brooklyn, New York 


Recent research by Singer, Meltzoff, and 
others, centering around the Rorschach Hu- 
man Movement response (M/), has provided 
experimental support for a theoretical struc- 
ture (14, 26) which considers impulse delay, 
empathic motion perception, fantasy, and 
thinking to be related, and, in some respects, 
interdependent processes. The research has 
suggested that M may be considered both as 
a product and a measure of the delay func- 
tion of the ego, or of inhibition ability (8, 9, 
11, 17, 18, 19, 20, 21, 22, 23). 

In several studies, M has been shown to 
correlate significantly with general intelli- 
gence or aspects of it (1, 2, 7, 24, 27). From 
the point of view of psychoanalytic theory, 
Rapaport, Gill, and Schafer (15) and Fromm, 
Hartman, and Marschak (6) have suggested 
that intelligence test performance may be 
conceptualized in terms of ego functions. The 
correlation of M with intelligence suggests 
that the delay function of the ego, or inhibi- 
tion ability, may be directly involved in in- 
telligence test performance. The present study 
is designed to examine the relationship of a 
specific bit of intelligence test performance 
to M and to another measure of inhibition 
ability. 

The mirror-image N, the symbol for the 
number 2 in the Wechsler-Bellevue Form I 
(25) digit symbol subtest, is reproduced as 


1 From the VA Regional Offices, Philadelphia, Pa. 
and Brooklyn, N. Y. 

2 Essentially this same paper was read at the East- 
ern Psychological Ass., Atlantic City, March, 1956. 

8 Currently at Devereux Schools, Devon, Pa. 
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an N by approximately ten per cent of our 
clinic population. Analysis of this error sug- 
gests several possibilities. Individuals who 
make this error may not make a necessary 
adjustment in an habituated motor response. 
The stimulus may be perceived correctly, but 
the error reflects poor inhibition of the motor 
act of writing the familiar NV. On a perceptual 
level, S may permit closure to take place too 
rapidly so that the normal N is actually per- 
ceived. On a cognitive level, S may respond 
as if there is no difference between the stimu- 
lus as given and the normal JN. At each level, 
the error may be considered a function of 
an insufficient delay or control of a response 
tendency. If this analysis is correct, then Ss 
who make the error (reversers) should pro- 
duce fewer M responses than controls who do 
not make the error. Reversers should also be 
less able than controls with respect to the 
ability to inhibit an old association and rap- 
idly substitute a new one for it. 


Subjects 


The 274 Ss are veterans with a wide va- 
riety of psychiatric diagnoses, who had been 
referred for psychological testing in an out- 
patient setting. 


Procedures 


Ninety-eight reversers were selected from 
our research files on the basis of the appear- 
ance of one or more reversals of the mirror- 
image N and the fact that the Rorschach test 
had also been administered. 

One hundred controls were selected by 





42 Murray Levine, Harvey Glass, and Julian Meltzoff 


choosing the next case, in alphabetical order 
to the reverser, who did not make the error 
and who had been administered the Rorschach 
test. The Rorschach scoring of the original 
examiner was accepted in all cases. Undoubt- 
edly, there is unreliability inherent in this 
procedure, but it is offset to some extent by 
the fact that most of the examiners in our 
clinic were trained at the same university and 
in largely the same clinical centers. 

To the group above were added 27 more 
reversers and 49 more controls who had been 
administered a test of cognitive inhibition. 
The procedure, previously described by the 
authors (8, 10) briefly is as follows: A list of 
10 easy paired associates is read to S. After 
the associates are learned to a criterion of one 
perfect recitation, S is asked to respond, upon 
presentation of the stimulus word, with any 
word other than the learned associate. Cog- 
nitive inhibition time (CIT) is taken as the 
average time interval between presentation of 
the stimulus and the response for the 10 pairs. 
Since this time presumably is taken up in 
part by the process of finding a new asso- 
ciation, a measure of word association time 
(WAT) is also obtained. WAT is computed 
as the mean response time to a list of 10 
other words taken from the same source as 
the original list (15). 

All tests, Rorschach, Wechsler-Bellevue, and 
Cognitive Inhibition were administered indi- 
vidually in the same clinical setting. 


Results 


The hypothesis that reversers will produce 
fewer M responses than controls is supported 


Table 1 


Percentage of Reversers and Controls Producing less 
than Two M Responses, Adjusted for 
Response Total (R) 














Reversers Controls 
Per- Per- 
R N centage N centage t p 
Not 
Adjusted 125 66.4 149 49.7 2.78 .005 
<20 73. «(76.3 79 55.7 2.54 .02 
>20 52 53.8 70 27.1 3.00 .003 


>30 20 55.0 35 11.4 3.49 .0005 








Table 2 


Mean Cognitive Inhibition (CIT) and Word Asso- 
ciation (WAT) Times, in Seconds, for 
Reversers and Controls 














Reversers Controls 
Test N Mean N Mean t p 
CIT 27_~=—s 5.83 49 4.46 2.24 03 
WAT 26 =2.79 49 2.48 aT = 








by the data (Table 1). A significantly greater 
proportion of reversers than controls produce 
less than 2 M. Two M was selected as a 
breaking point since it is approximately the 
median M for our total clinic population. It 
has been suggested that differences in Ror- 
schach scores may be attributable to differ- 
ences in total number of responses (R) (4). 
There is a difference in R between the two 
groups in favor of controls significant at p 
= .07. The two groups were therefore com- 
pared for M with some control for R. Under 
these conditions, reversers still produce less 
than 2 M significantly more frequently than 
controls. The difference actually becomes more 
pronounced as R increases. In the group pro- 
ducing 30 or more responses, 55 per cent of 
reversers produced fewer than 2 M while only 
11 per cent of controls have less than 2 M. 

The hypothesis that reversers will be poorer 
on a test of cognitive inhibition is also sup- 
ported (Table 2). Reversers had a mean CIT 
of 5.8 seconds while controls had a mean time 
of 4.5 seconds. This difference is significant at 
p = .03. As found in previous studies (8, 10), 
the difference in inhibition scores is appar- 
ently not simply dependent upon associative 
facility. There is no significant difference be- 
tween reversers and controls in word associa- 
tion time. 

Since color responses are often held to be 
related to impulsiveness (3, 5, 12, 15) they 
were examined in this context. Reversers, sig- 
nificantly more often than controls, have a 
Sum C = 0. Only 18.8 per cent of the con- 
trols did not produce any color responses 
while 30.4 per cent of the reversers showed 
an absence of color. This difference was sig- 
nificant at » = .02. When this difference in 
color productivity was tested controlling for 
R, the difference in proportions producing no 
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color responses was significant only for the 
group producing more than 20 R. Under these 
circumstances it cannot be concluded that an 
absence of color responses is related to the 
reversal error independently of the depend- 
ence of Sum C on R. There are also no sig- 
nificant differences between reversers and con- 
trols in the production of FC, CF, or C. 
However, in all categories of color scoring, 
controls tend to be slightly more productive. 

The results were also considered in terms of 
the Erlebnistype ratio, M to Sum C. Four 
groups were formed on the basis of the me- 
dians of the distributions of M and Sum C. 
A significantly greater proportion of reversers 
than controls fell into the Low M — Low Sum 
C class. A significantly greater proportion of 
controls than reversers fall into the High M — 
High Sum C class. The examination of experi- 
ence balance demonstrates no new finding. It 
is not unlikely that the differences in Erleb- 
nistype are primarily due to the difference in 
M between reversers and controls. 

Controls, as a group, have a significantly 
higher IQ than do reversers. Reversers have 
a mean IQ of 100.76 (o = 13.25), controls a 
mean IQ of 109.46 (o = 15.14). This differ- 
ence yields a ¢ = 5.06 which is significant at 
p = .0001. The distribution of IQs for re- 
versers is a normal one and closely approxi- 
mates Wechsler’s norms. The distribution of 
IQs for controls is skewed* and contains a 
preponderance of high IQ subjects. About 33 
per cent of controls have an IQ of 120 or 
more while only 5 per cent of reversers have 
IQs of 120 or greater. 


Discussion 


The data confirm the analysis of the re- 
versal error as a manifestation of poor ability 
to inhibit or delay responses. Beyond this, 
there is considerable suggestion that adequate 
inhibition ability is an important factor in 
earning a high score on the intelligence test. 
Reversers had a significantly lower mean IQ 
than did controls and this difference may not 
be wholly attributed to artifact. Any single S 


* Because of the skewness in the one distribution, 
a nonparametric test based upon variation in the 
two groups about the median of the combined 
groups was attempted. For 1 df, chi square = 39.56, 
p < .0001. 
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could not have lost more than one weighted 
score point even if he had made the maximum 
number of errors possible. The one weighted 
point could hardly make a difference of more 
than one or two IQ points in any given in- 
stance. The obtained mean difference was al- 
most 9 IQ points. Since this is the case, it is 
reasonable to assume that other instances of 
lost IQ points were also due to failure of the 
delay mechanism. Qualitative analysis of per- 
formance on many of the subtests very likely 
would reveal other manifestations of poor in- 
hibition ability. 

There is a growing body of evidence to 
suggest inhibition ability involves a stable 
process in the person extending beyond the 
immediate stimulus situation. In terms of 
Wechsler-Bellevue performance, this would in- 
dicate there are processes in the person ex- 
tending across particular subtests. The diffi- 
culty in validating “patterns” with the 
Wechsler-Bellevue (13) may very well have 
resulted from the attempt to impose arbi- 
trary meanings on the subtests in place of 
examining manifestations of definable ego 
processes involved in test performance. 

Rorschach (16) noted a relationship be- 
tween M and general intelligence and sev- 
eral studies since have reported significant 
correlations of M with various aspects of in- 
telligence (1, 2, 7, 24, 27). However, the 
rationale underlying such correlations is no- 
where clearly stated. The present findings 
support the rationale that both M and impor- 
tant aspects of intelligence test performance 
involve the delay function of the ego. Schul- 
man’s findings (17) relating M to abstraction 
ability and his suggestion that abstract think- 
ing reflects a delaying mechanism are in sup- 
port of such a view. Our present data reveal 
another specific aspect of intelligence test 
performance which seems to involve inhibi- 
tion ability, while M has already been theo- 
retically and experimentally related to such a 
function. This hypothesis would direct us to 
seek further relationships between operation- 
ally defined and experimentally meaningful 
measures of ego functions and specific aspects 
of intelligence test performance. While one 
may look at this evidence as an approach to 
the so-called “nonintellective” factors of in- 
telligence, eventually it may be possible to re- 
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late concepts of intelligence and intelligent be- 
havior in a general theory of personality. 

The present findings also lend confirmation 
to previous studies (5, 8, 20) which have 
failed to support the relationship of color re- 
sponses to impulsiveness. Our present data do 
not reliably support the conclusion that color 
responses are related to the accurate execu- 
tion of the reversed V. The trends in the data 
suggested another hypothesis about color re- 
sponses which eliminates the necessity for 
tortuous reasoning relating color to emotion- 
ality. Both the color on the Rorschach cards 
and the mirror-image quality of the Wechsler 
symbol may be conceptualized as environ- 
mental features demanding attention. In the 
case of the Rorschach, the color either stands 
out as figure on the cards, or, in appearing 
after a long series of uncolored cards, it in- 
troduces a striking difference difficult to ig- 
nore. Similarly the instructions of the Wech- 
sler subtest explicitly require accuracy, and 
the mirror-image quality cannot be ignored 
without S failing this aspect of the task. From 
this point of view, the suggestive relationship 
between absence of color responses and writ- 
ing N reflects a process of inadequate coming 
to terms with an outstanding feature of the 
environment. In both instances, in one way 
or another, S is ignoring or avoiding the par- 
ticular environmental feature. The hypothesis 
that color responses reflect adaptive respon- 
siveness to attention demanding environmen- 
tal stimuli is one capable of empirical test. 


Summary 


Two groups of veterans with a wide va- 
riety of psychiatric diagnoses were differenti- 
ated on the basis of whether or not the Ss had 
reproduced the reversed N of the Wechsler- 
Bellevue digit symbol subtest as an NV. Analy- 
sis of the error led to the hypothesis that the 
error was a function of an insufficient delay 
or control of a response tendency. This hy- 
pothesis was supported when it was shown 
that reversers (those who made the error) 
produced significantly fewer M responses than 
controls (those who did not make the error). 
Reversers were also shown to be less able 
than controls in the ability to inhibit an old 
association and rapidly substitute a new one 
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for it. The mean IQ of the reversers was sig- 
nificantly lower than the mean IQ of controls. 

The findings provide further evidence of 
the general significance of the inhibition proc- 
ess, as measured by M and by specific tasks, 
in that manifestations of the inhibition proc- 
ess can be identified in intelligence test per- 
formance. These data suggest it may be fruit- 
ful to attempt to subsume concepts of intelli- 
gence and intelligent behavior under more 
general personality theory. 


Received April 25, 1956. 
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Rorschach Response Characteristics as a Function 
of Color and Degree of Emotional Constriction’ 


Arthur Canter 
Johns Hopkins University School of Medicine 


According to Rorschach test rationale the 
reaction to color is dependent upon the emo- 
tional control of the responder. The achro- 
matic-chromatic studies generally have not 
controlled the effects of this relationship. The 
purpose of this study was to attempt such a 
control to examine the role of color. 

A response-defined scale of emotional con- 
striction (EC) was constructed. From a large 
pool of college student subjects, 38 pairs were 
drawn, matched on the basis of their EC 
scores, age, sex, and verbal ability. One mem- 
ber of each pair was assigned at random to 
either Group A or B, forming two groups of 
equal size having almost identical distribu- 
tions of EC scores. Each group was further 
subdivided into three levels of EC scores: 
high (Level I), moderate (Level II), and 
low (Level III). Group A subjects were given 
the standard Rorschach test and Group B, an 
achromatic duplication of the test. Either 
Rorschach was administered individually. 

The following hypotheses were made: (a) 
R, initial RT (reaction time), and F + % 
for 10 cards and for five “color” cards would 
differ between Groups A and B with the great- 
est differences between EC Level III pairs of 
subjects. (6) The color scores for Group A 


1An extended report of this study may be ob- 
tained without charge from Arthur Canter, Phipps 
Psychiatric Clinic, Johns Hopkins Hospital, Balti- 
more 5, Md., or for a fee from the American Docu- 
mentation Institute. Order Document No. 5088 from 
ADI Auxiliary Publications Project, Photoduplica- 
tion Service, Library of Congress, Washington 25, 
D. C., remitting in advance $1.75 for microfilm or 
$2.50 for photocopies. Make checks payable to Chief, 
Photoduplication Service, Library of Congress. 


subjects would be related to their EC scores 
in accordance with the color-affect hypothe- 
sis. (c) If the R for the two Rorschach groups 
were found to be equivalent, the achromatic 
subjects would have more shading scores than 
the standard subjects. (d) This difference in 
(c) would disappear if the shading and color 
scores for Group A were combined in terms 
of form-dominance and compared with the 
shading scores of Group B. (e) The variances 
of the Rovschach scores would be higher in 
EC Level III than Level I. 

Hypothesis (a) was tested by the Median 
Rank test for each EC level. No differences 
were found between Groups A and B on R 
and RT for each EC level. The analyses of 
F + % were also either not significant or con- 
tradictory leading to a rejection of hypothesis 
(a). No significant relationships between EC 
status and color scores were obtained. The 
Median test analysis of (c) and (d) yielded 
positive results. Group B gave more shading 
scores (p < .02) but this difference disap- 
peared when the shading and color scores for 
Group A were combined as predicted. This 
suggests that color and shading are used simi- 
larly. Hypothesis (e) was also supported with 
higher variances for the Rorschach scores on 
Level III on both types of Rorschach. Thus 
color is apparently not as important to the 
variability of Rorschach performance as emo- 
tional constriction per se. 

As found in other studies, the role of color 
and the color-affect hypothesis seem to have 
been overvalued for the Rorschach test. 


Brief Report. 
Received November 9, 1956. 
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Rorschach Scores as a Function of Four Factors’ 


Conrad Consalvi 


Ohio State Reformatory 


and Arthur Canter 
Johns Hopkins University School of Medicine 


The purpose of this study was to evaluate 
the factorial composition of the Rorschach 
test in terms of intelligence and the formal 
processes involved in the scores. Several stud- 
ies have indicated that intelligence, verbal 
fluency, and productivity play significant 
roles in the occurrence of the scoring cate- 
gories (1, 2, 6, 12). It has also been sug- 
gested that Rorschach determinants may be 
organized according to the dominance or lack 
of dominance of form without regard to the 
differential use of the stimulus qualities of 
the inkblots (9, 11, 14). The data of the 
factorial studies have had limited generality 
because the samples used often represented 
restricted populations with respect to intelli- 
gence or behavioral characteristics. Since little 
effort has been made to distinguish between 
general intelligence and verbal ability in such 
studies, it is not possible to evaluate the ef- 
fects of the latter independent of general in- 
telligence. 

The present study was designed to investi- 
gate the following major questions: 


1. Does verbal ability operate differentially 
from relatively culture-free general intelli- 
gence on the Rorschach test? 

2. What are the relationships between the 
intelligence factors and the scoring categories 
of the Rorschach? 


1 This report is based on the data from a Master’s 
thesis submitted by Conrad Consalvi to Vanderbilt 
University under the direction of Dr. Arthur Canter, 
then of the Vanderbilt faculty and Mr. Ray Norris 
of George Peabody College for Teachers. The au- 
thors wish to express their appreciation to Mr. Nor- 
ris for his invaluable aid in the statistical design and 
analysis of the data. 


3. What is the effect of combining Ror- 
schach scores in terms of form-dominance? 
Logical considerations as well as the findings 
of other studies suggest that bright and 
achromatic color might be combined and that 
shading scores not used as colors might be 
combined in terms of form-dominance. 

4. Finally, do the movement scores repre- 
sent a different factorial composition that 
would support the traditional views of sepa- 
rating them from each other as well as from 
the “external” determinants? 


Procedure 


The subjects were 45 adults (22 males and 
23 females) between the ages of 20 and 36, 
with a mean age of 27 years. The educational 
level ranged from the 6th grade to profes- 
sional college, with the median grade com- 
pleted as the 12th. Approximately 24 per cent 
had one or more years of college. Occupation- 
ally, the group was diverse, including house- 
wives, clerical workers, firemen, nurses, hos- 
pital aides, and mechanics in various trades. 
None of the subjects was a student at the 
time of testing. Included in the sample were 
six inmates from a local institution for the 
feebleminded. This was considered necessary 
to sample the lower end of the intelligence 
range. In no case was there evidence of be- 
havioral or central nervous system disorder. 
Except for the intellectual limitations of the 
institutional cases, the sample was consid- 
ered normal in the ordinary sense of the term 
rather than the psychiatric one in which nor- 
mality may represent an ideal of adjustment. 

Three tests were administered to each 
subject: the Raven’s Progressive Matrices 
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(1938), the Vocabulary test of the Wechsler- 
Bellevue Scale Form I, and the Rorschach 
test. The Raven’s test was chosen as a gen- 
eral nonverbal test having wide applicability 
and freedom from contamination of speed 
factors. The Vocabulary test was chosen as 
a measure of verbal ability which correlates 
highly with other verbal tests of the Wech- 
sler scale (10). Both intelligence tests were 
administered individually by the usual stand- 
ard procedures. The inquiry. method for the 
Rorschach entailed the use of nonleading 
questions by the examiner to establish the 
determinant scores. No determinant was pre- 
sumed to be operative unless stated by the 
subject without suggestion from the examiner. 
All responses were scored after Klopfer (5). 

Fourteen scores were used for the factor 
analysis. These included the Raven’s test 
score, the raw score of the Vocabulary test 
and the following Rorschach categories or 
combinations: W +, M+, FC + FC’, FM + 
m, A%, C +C’ (includes CF and C’F), FK 
+ Fc + Fk, D (includes Dd and d), #Con 
(number of content categories, based on the 
schema of Phillips and Smith [8]), K +c+ 
k (includes KF, cF, and kF), F%, and R 
(number of responses). The omission of form- 
level rating scores was deliberate in view of 
their high degree of subjectivity and the con- 
troversy centered about their use (4, 9). The 
W and M scores represented plus values in 
that obviously poor quality responses were 
not included. Less equivocal judgments were 
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involved in the decisions for poor responses 
as opposed to assigning various degrees of 
positiveness used in a form-level rating 
scheme. A basal rating of 1.0, using Klopfer’s 
criteria (5) would correspond to the plus 
values of M and W in the present study. 

The raw scores of all Rorschach variables 
were converted to JT scores since many of 
their distributions were skewed. The vari- 
ables were intercorrelated by the Pearson 
product-moment formula and the resulting 
14 X 14 matrix factorized by the centroid 
method. The highest correlation in each col- 
umn was used as the communality value, 
with replacement for each factor extracted 
(7). The centroid factors were subjected to 
two series of orthogonal, single plane rota- 
tions. Rotation was stopped when it became 
apparent that continuing would not improve 
the meaningfulness of the factors. 


Results and Discussion 


The original matrix of intercorrelations is 
given in Table 1. The centroid factor loadings 
and the rotated loadings are presented in 
Tables 2 and 3. Four factors were extracted 
with their compositions identified in Table 3. 
Loadings of .300 or higher are interpreted as 
significant. 


Factor I' 


Factor I’, with nine variables, clearly seems 
to be the intelligence factor in this analysis. 
It has positive loadings on both Matrices and 


Table 1 


Original Matrix of Intercorrelations * 

















Variable | 2 3 4 5 6 8 9 10 11 12 13 14 

1. Matrices 713 288 468 190 207 —377 —063 583 133 335 093 —442 210 
2. Vocabulary 259 420 445 249 —328 131 500 303 439 117 —484 375 
3. W+ 529 198 420 —438 278 292 025 509 224 —459 512 
4. M+ 223 467 —417 —036 260 240 460 —088 —542 414 
5. FC+FC’ 191 —288 491 425 515 535 047 —448 585 
6. FM+m —332 101 221 436 471 072 —600 555 
a 

8. 

9. FK+Fc+Fk 386 351 -—022 —396 411 
10. D 549 —166 000 748 
11. #Con 258 —433 819 
12. K+c+k — 269 178 
13. F% —344 


_R 








* Correlations of .296 are significant at the .05 level (two-tailed test). 
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Table 2 
Centroid Factor Loadings 











Variable I II Ill IV h? 
1. Matrices 559 —539 -—348 -—230 777 
2. Vocabulary 653 —448 —-114 —247 701 
3. W+ 591 185 —287 197 505 














4. M+ 578 -—183 —329 353 600 
5. FC+FC’ 616 —021 395 —236 592 
6. FM+m 581 037 —138 396 S15 
7. A% —634 —207 184 055 482 
8. C+C’ 492 598 236 —396 812 
9. FK+Fc+Fk 574 —371 107 —190 515 
10. D 493 —178 572 349 724 
11. #Con 861 202 182 117 829 
12. K+c+k 244 401 —159 —257 312 
13. F% —702 —162 443 215 762 
14. R 819 152 405 305 951 
Table 3 
Rotated Factor Loadings 

Variable 4 Il’ II!’ Iv’ h? 
1. Matrices 882 —001 -—001 —001 778 
2. Vocabulary 798 080 195 —143 702 
3. W+ 323 320 335 432 506 
4. M+ 516 —034 345 463 601 
5. FC+FC’ 307 302 506 —384 589 
6. FM+m 297 079 498 415 515 
7. A% —362 —466 -—300 —204 480 
8. C+C’ — O44 812 316 —218 809 
9. FK+Fc+Fk 599 O44 304 —248 515 
10. D 105 —193 800 —183 722 
11. #Con 320 415 742 078 831 
12. K+c+k 038 552 —020 075 312 
13. F% —576 —581 -—114 —278 760 
14. R 187 000 956 024 949 





Vocabulary. Both variables show no signifi- 
cant loadings on any of the other factors. 
This supports the findings of Williams and 
Lawrence (11). The Vocabulary score corre- 
lates or fails to correlate with the Rorschach 
variables to about the same degree as does 
the Matrices with the exception of FC + FC’. 
Although FC + FC’ correlates higher with 
Vocabulary than with Matrices, the magni- 
tudes of the correlations are too low to be of 
practical value. Thus no support can be given 
to the assertion that verbal ability operates 
independently of general intelligence to affect 
Rorschach scores. 

The Rorschach variable having the highest 
loading on the inntelligence factor was the 
form-dominant shading category (FK + Fc 
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+ Fk). It would seem that interpreting three 
dimensions or well-configured texture from 
shades of gray requires more “abstracting” 
ability than using form alone or colored 
shapes without such further elaboration. Thus 
interpretations which take into account the 
intellectual factors in the use of FK, etc., 
would have support. The loadings of M + 
and W+ are consistent with. traditional 
views about their relationship to intelligence. 
The use of the individual scores to predict 
intelligence appears to be of doubtful value 
since none of the correlations with the intel- 
ligence scores is sufficiently high. However 
combining the scores in a multiple regression 
equation seems to offer a useful approach. 
The multiple correlation between Matrices 
and the four variables showing positive load- 
ings in Factor I’ was determined as .675. The 
regression equation derived from the data 
was: Z, = — .06 2, + .36 22+ 49 23 + .03 
24, where Z,»,, 21, 22, 23, and 24 represent the 
standard scores for Matrices, W +, FK + Fc 
+ Fk, and #Con, respectively. 

The two variables having negative loadings 
on the intelligence factor, A% and F%, also 
lend support to traditional views. It appears 
that the dull individual may give a higher 
proportion of form-determined responses be- 
cause he is less able to analyze his percepts, 
rather than because the percepts differ ma- 
terially from those of brighter subjects. The 
concept of constriction, in the emotional 
sense, would not be necessary to account for 
the extremely high F% record in such cases. 
However it seems likely that if a bright sub- 
ject gives almost all Fs, factors other than 
intelligence are crucial. The concept of con- 
striction may be more appropriately used in 
such instances. These findings also tend to 
argue against viewing the extremely high F% 
records of subjects having below average in- 
telligence as reflecting a barrenness of per- 
sonality without extra-Rorschach evidence. 


Factor II’ 


Seven variables load on Factor II’ which 
may be tentatively designated as “low-form.” 
Other studies have proposed that low-form 
involves a lack of perceptual control (11, 13), 
which carries with it some interpretive con- 
notations which the present study was not 
designed to evaluate. The negative loading of 
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A% and F% are consistent with the low- 
form designation. The fact that FC + FC’ 
has a positive loading is inconsistent but it is 
relatively low (.302) und may represent an 
artifact. It might be considered to reflect the 
problem that is involved in the scoring of 
FC versus CF. In such instances, when both 
form and color are used, the nondirective in- 
quiry method requires that the subject indi- 
cate which is dominant. It is conceivable that 
there is a highly arbitrary solution in many 
instances with the direction of choice depend- 
ing upon factors unknown to the examiner. 
One method of resolving this problem may be 
found in the technique used by Baughman in 
which substitute cards are offered to the sub- 
ject to test the importance of the various 
stimulus qualities (3). 

The identification of a low-form factor 
supports the use of form-dominance as a di- 
mension without the breakdown into the tra- 
ditional units such as C, K, k, c, etc. The as- 
sociational content that may seem strikingly 
different among the responses making use of 
the various qualities may still lend them- 
selves to analysis without reference to the 
determinant score. This would be consistent 
with much of actual clinical practice (8, 9). 


Factor III’ 


Factor III’, with ten variables loading on 
it, seems to represent productivity. It has ap- 
peared in most factorial studies and R has 
been shown to correlate significantly with 
most Rorschach scores (9, 11, 13). The vari- 
ables having the highest loadings, R, D, and 
#Con, serve to identify the factor. The much 
higher loading of D and W would be expected 
a priori. It is interesting to note that R 
failed to load on the intelligence factor as 
well since it is often taken as an indicator of 
high intelligence in the absence of special per- 
sonality or behavioral characteristics such as 
mania. 

It is apparent that productivity plays a 
greater role for the form-color determinants 
than for the form-shading ones. The latter 
depend more upon intellectual factors which 
may partly account for their relative infre- 
quency. This is not as clear-cut in the case 
of the low-form color and shading responses, 
neither of which have loadings on the intelli- 
gence factor. Low-form color responses are 
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also more influenced by productivity than are 
low-form shading responses. This differential 
effect of productivity upon color responses as 
contrasted to shading responses may be in- 
dicative of the accidental nature of the Ror- 
schach blots. It should be apparent that the 
total natural divisions (i.e., D and d) of the 
five colored cards is greater than that for the 
five achromatic cards. The distribution of 
the configurational divisions of the blots cre- 
ates limiting factors in the frequency poten- 
tial of color versus shading responses. The 
use of ratio scores of high to low dominance 
of form, such as FC: CF + C, offers compli- 
cations for interpretation not usually consid- 
ered since productivity and intelligence fac- 
tors play different roles for each variable. It 
is also apparent that correcting for produc- 
tivity by dividing the scores by R may not 
be equally effective as it is for F% and A%, 
both of which seem fairly well controlled in 
this way. 


Factor IV' 


There are only four variables with signifi- 
cant loadings in Factor IV’. It may be identi- 
fied as Movement has been designated in a 
previous factorial study (11). The loading of 
W + might be taken as an indication that 
good Ms are frequently good Ws, and their 
shared role in the intelligence factor should 
be noted. The composition of Factor IV’ ar- 
gues against considering FM and M as rep- 
resenting fundamentally different appercep- 
tive processes. The only support for their 
separation as scores comes from the different 
loadings each has on intelligence. It appears 
that as one goes up the scale of intelligence, 
M increases at the expense of FM. The in- 
terpretation of FM as representing impulses 
for immediate gratification (5) may not be 
necessary to account for the changes in 
FM:M proportions with age and sophistica- 
tion. The telative frequencies of H and A re- 
sponses also have to be considered, since each 
type of movement perception is generally de- 
pendent upon the occurrence of human or 
animal associations. One might profitably ex- 
amine what happens when individuals of dif- 
ferent levels of intelligence and mental age 
are faced with configuration so designed as 
to be highly provocative of animal or human 
features only. 
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The findings of an earlier study (13) re- 
porting a significant correlation between FC 
and M were not borne out. The negative load- 
ing of FC + FC’ in Factor IV’ offers some 
evidence for placing movement and color on 
opposite poles, although the role of FC in 
the present study may have been distorted 
as already discussed. Whether Sum C is the 
most adequate way to represent the color 
pole was not tested and remains problematical 
(9). The interpretive meanings that would 
be useful in considering M and FM along 
similar dimensions also remains to be studied. 
One possibility suggested is that both M and 
FM represent the same factor at different 
levels of sophistication. 


Summary and Conclusions 


A group of behaviorally normal adults 
representing a broad range of intelligence, 
education, and occupation were given the 
Raven’s Progressive Matrices, the Vocabulary 
test of the Wechsler-Bellevue Scale Form I, 
and the Rorschach test. The scores were sub- 
jected to a factor analysis resulting in the 
extraction of four factors identified as: intel- 
ligence, productivity, low-form, and move- 
ment. The evaluation of the factorial com- 
positions suggests the following: 

1. Both the verbal and nonverbal meas- 
ures of intelligence affect Rorschach scores 
in a similar manner and help identify a single 
general intelligence factor in the Rorschach. 
A correlation based on the multiple regres- 
sion equation between the Matrices scores 
and four Rorschach categories (M+, W +, 
FK + Fc + Fk, and number of content cate- 
gories) yields a value which accounts for 
about 45 per cent of their common variance. 
This provides a limited but potentially use- 
ful predictor of intelligence from the Ror- 
schach test. 

2. Productivity is minimally related to in- 
telligence. This factor is more dependent upon 
the extent to which a subject uses part as op- 
posed to whole areas of the blots and other 
nonintellectual and situational variables. Cor- 
recting for the effects of productivity by di- 
viding by the number of responses is not 
equally satisfactory for all variables, although 
A and F scores seem adequately corrected in 
this manner. 
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3. There is a factorial similarity among 
the high form-dominant color and shading 
scores as well as a unique factor of low form- 
dominance which includes both color and 
shading. This suggests that the traditional 
method of separating determinants into the 
various color and shading categories may be 
unnecessary. 

4. Movement may be regarded as a sepa- 
rate factor which includes both M and FM 
+m within it. The chief differentiation be- 
tween the two major movement categories 
appears in the finding that M loaded on the 
intelligence factor while FM did not. 


Received April 20, 1956. 
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Pitfalls in Interpretation of Parental Symbolism 
in Rorschach Cards IV and VII 


Sol Charen 


Montgomery County Mental Health Clinic, Rockville, Maryland} 


There is an accepted practice among Ror- 
schach workers to think of Cards IV and VII 
as symbolizing the father and mother figures 
respectively (1, 3, 4, 7, 8, 9). Thus, inter- 
pretation to these two cards becomes a mat- 
ter of treating responses made to them as in- 
dicative of the subject’s attitude to his par- 
ents when he was a child. The rationale for 
such interpretation is based on empirical evi- 
dence or theoretical reasoning. Bochner and 
Halpern state of Card IV, “The heavy male 
figure may suggest the father or authority in 
general; this may be pleasant or unpleasant. 
Its dark quality and overwhelming character 
are particularly disturbing to those for whom 
parental authority is still an unsolved prob- 
lem” (3, p. 81). And for Card VII, they add, 
“The two female faces or even female figures 
(in reverse position ‘dancing girls’), as well 
as the generally soft, light quality, give this 
card a feminine quality, frequently with ma- 
ternal implications” (3, p. 82). 

A search of the literature actually discloses 
only three articles which offer experimental 
evidence to substantiate the possibility that 
Cards IV and VII can be interpreted in this 
manner. Two of these are quoted as definitive 
(1, pp. 94-95, 99-100; 7) but they still do 
not offer proof these two cards are univer- 
sally seen as father and mother symbols. The 
most widely cited research is that of Meer 
and Singer (8). Their subjects were 50 col- 
lege fraternity men who were asked to select 
two cards, one of which could be considered 
a “father card” and the other a “mother 
card.” Nine chose Card II and twelve Card 
IV as the father card; ten chose Card VII 


1 Now at Child Guidance Service, Jewish Social 
Service Agency, Washington, D. C. 


and ten Card X as the mother card. The re- 
sults were statistically significant. It is of in- 
terest to note (in view of the statement by 
Bochner and Halpern about the disturbing 
quality of Card IV) that those who chose 
Cards IV and VII gave clinical evidence, ac- 
cording to Meer and Singer, that they were 
fond of their parents. 

To use Card IV as a symbol of the father 
or authority figure, on the basis of this re- 
search, would mean that twelve out of fifty 
times the interpretation would be justifiable. 
Thirty-eight out of fifty times the use of such 
symbolism would result in error. The same 
reasoning holds true for the hypothesis that 
Cards VII and X are mother-figure repre- 
sentations. 

Rosen (11) repeated the experiment of 
Meer and Singer, using as subjects 180 uni- 
versity psychology students. While statisti- 
cally significant results were again obtained 
for Cards IV and VII, there were such marked 
individual differences as to the symbolic mean- 
ings of all ten cards that gross errors could 
be made if only these two cards were accepted 
as universal symbols of parental figures. 

The third article in the literature dealing 
with this topic is that of Hirschstein and 
Rabin who recently claimed additional sub- 
stantiation of the symbolic values of Cards 
IV and VII as a result of their study of 
adolescent delinquents. Their main hypothe- 
sis was “individuals who are adolescent de- 
linquents and in whose early background there 
were no significant mother or father figures 
would react more readily and more easily to 
these cards (IV and VII) than would a simi- 
lar group of delinquents who grew up in the 
standard family situation and who had the 
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opportunity of acquiring concepts of father 
and mother figures, disturbing as they may 
be” (7). The hypothesis was verified. While 
response times were consistently slower to all 
cards for the delinquents with parents, only 
to Cards IV and VII were there statistically 
significant differences between them and the 
delinquents without parents. 

A theoretical criticism of the hypothesis 
made is that it is difficult to conceive of any 
person, unless he is schizophrenic, who does 
not have a parental surrogate of some sort. 
The mere fact that natural parents were lack- 
ing for the one group is not in itself proof to 
support the hypothesis investigated. The au- 
thors did not indicate whether deep analysis 
of these adolescents showed the one group 
lacking to a greater degree the identification 
or introjection of a parental figure. 

Schachtel, in his Rorschach investigation of 
500 juvenile delinquents, reported that they 
differed from a matched group of nondelin- 
quents;of the same socioeconomic class pri- 
marily |in their attitude to authority. 

The most important consideration ... (in deter- 
mining whether or not a boy was delinquent) was 
whether or not the boy showed much dependence 
on, and fear of, authority. The more such fear and 
dependency had become part of the character struc- 
ture and the commands and prohibitions of the sig- 
nificant authoritative adults had been internalized, 
the more likely the boy would not become delin- 
quent. Internalization of authority (or to use Freud’s 
terminology, formation of a strong super-ego) thus 
was the most decisive single factor or rather constel- 
lation of factors in making the judgments in delin- 
quency and non-delinquency (12, p. 148). 


In view of Schachtel’s hypothesis, which he 
substantiated in making only two errors in 
500 matchings, one can find no basis for 
Hirschstein and Rabin’s hypothesis. The sta- 
tistically significant differences to Cards IV 
and VII found in their research are proof of 
some dissimilarity between their two groups, 
but not necessarily of attitudes to parental 
figures. 


Theoretical Criticisms 


Roy Schafer’s criticisms of the assignment 
of fixed symbolic meanings to Rorschach 
cards are specifically applicable to this dis- 
cussion. Ee states: 


Card VII ...has been held to represent the 
mother figure, and all responses to this card to rep- 
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resent therefore concepts and attitudes toward the 
mother figure... . The errors lie in reasoning (1) 
as if no adaptive and defensive ego functions stand 
between the stimulus and the deep dynamics of the 
individual, (2) as if there were no relatively neutral 
images available to the patient in his efforts to cope 
with the stimuli, (3) as if there could be only one 
dynamic meaning in the card or area in question, 
(4) as if a statistical trend is the same as a perfect 
correlation, and (5) as if all we have learned about 
personality-rooted individual differences and percep- 
tual organizing principles were still unknown (13, 
p. 146). 


Rorschach himself revealed an _ intuitive 
ability to use dynamic symbolism, but not by 
one-to-one reasoning. Those who are inter- 
ested in a psychoanalytic method of deter- 
mining the patient’s attitude to the father 
figure should reread his article, “The appli- 
cation of the Form Interpretation test,” in 
order to see how careful and conscientious he 
was in determining that the midline responses 
symbolized the father figure for the particular 
patient he had tested (10, pp. 209-213). 

Freud himself, contrary to popular belief 
did not arbitrarily assign a fixed meaning to 
symbols. Even he, gifted as he was in em- 
pathic understanding, laid down a cardinal 
principle that the patient must furnish the 
meaning of his symbols. In writing of dream 
interpretation he stated: 


At the same time I must expressly warn the in- 
vestigator against overestimating the importance of 
symbols in the interpretation of dreams, restricting 
the work of dream-translation to the translation of 
symbols and neglecting the technique of utilizing the 
associations of the dreamer. The two techniques of 
dream-interpretation must supplement one another; 
practically, however, as well as theoretically, preced- 
ence is retained by the latter process which assigns 
the final significance to the utterance of the dreamer, 
while the symbol translations which we undertake 
play an auxiliary part (5, p. 247). 


Frieda Fromm-Reichmann’s comments on 
symbolism, while not intended for the Ror- 
schach, are equally pertinent. She stated: 


Their significance (symbols) definitely varies with 
the personality, the life-circumstances and the prob- 
lems of the dreamer. In one person’s dream, for in- 
stance, a snake may appear as a male symbol, while 
another dreamer may use a snake to express female 
shrewdness and seductiveness. Again a snake may be 
used by an archeologist to express the attributes of 
one or another of the multitude of male or female 
gods or goddesses whose total or partial embodiment 
is that of a snake (6, p. 165). 
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Freud and Fromm-Reichmann are quoted 
because they emphasize that any attempt to 
determine what lies behind a patient’s use of 
symbols must depend on the specific mean- 
ing of a symbol to him. The assumption of 
a fixed symbolic meaning to any Rorschach 
card does not seem justified in my opinion 
either on an experimental or theoretical ba- 
sis, at the present time. 


Investigation of the Problem 


The procedure used by Rorschach and 
Oberholzer of joint investigation of Rorschach 
responses and clinical psychoanalytic material 
is the ideal method of determining whether 
Cards IV and VII have specific meanings. 
This method of investigation seems not to 
have been reported upon in the literature de- 
spite claims of empirical validation (4, 9). 
There are obvious difficulties in this kind of 
collaboration since only a few Rorschach 
workers are fortunate in working with thera- 
pists who do deep analytical investigation. 

In his private practice as a testing psy- 
chologist the writer administers the Rorschach 
in the usual manner with a free association 
followed by an inquiry period. The individual 
tested is then asked to “Pick out the card 
which is most like your mother,” and then, 
“Pick out the card which is most like your 
father,” after which he is asked the reasons 
for his choices. This kind of additional in- 
quiry was adopted on the hypothesis that a 
response given to the Rorschach cards must 
have its origin in the unconscious of the pa- 
tient. In terms of the premise of Meer and 
Singer and accepted trends in the Rorschach 
field the additional hypothesis can be tested 
that patients’ responses to the above questions 
would have their origin in the same uncon- 
scious motivation to a card which stirred un- 
conscious memories of parental images. 

With-but few exceptions the questions were 
interpreted by patients to mean, “Which re- 
sponse that you gave to the Rorschach re- 
minds you of your parent.” 

The subjects accepted the questions as part 
of the testing routine. Those who asked for 
additional information were told to rely on 
their own interpretation of the questions. 
Over fifty successive adults were tested in 
this manner. Unfortunately, no records were 
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kept of biographical data, but since the re- 
search was based on the assumption of the 
universality of meaning to symbols this omis- 
sion is not important. All tested were of both 
sexes, white, and from all socioeconomic 
classes except relief groups. They were re- 
ferred for projective testing by private psy- 
chiatrists, generally after a first or second 
treatment interview. The referring requests 
were usually for evaluation of personality 
assets and potentiality for treatment. All were 
ambulatory, nonorganic, and included all 
diagnostic categories, but as of the time of 
testing were not in need of hospitalization. 


Results 


The list which follows gives an idea of 
areas selected, responses given, and in italics, 
the reasons given by these patients why a 
particular response or area reminded of a 
parent. Numbers used to locate D and Dd 
areas are those of Beck (2, pp. 30-35). 


Card I 
Area 
Father 
W A face with eyes. My father’s face. 
Mother 
W Woman (D 4) held by two people. J saw 
my mother dragged around by my two 
aunts. 
D 3 Woman’s body. My mother was heavy like 
this figure. 


D4 Woman modelling. My mother liked to 
wear pretty clothes. 

Woman. My mother was very neat and 
kept herself very straight. 

Form of a lady and hands out as if look- 
ing for help. Doesn’t know where to turn 
because her head isn’t there. My mother 
wasn’t capable of doing anything by 


herself. 
Card II 
Father 
W Medical illustration. My father is a doctor. 


Fire and smoke. Reminds me of a fire. 
When I was a kid I was afraid the house 
would burn down because my father 
smoked in bed and burned the bed 
many a time. 
Di Bears. My mother used to call my father 
a bear. 
Teddy bears. My father is devoted to my 
young son who had a teddy bear. 
Two bears praying. My father is very re- 
ligious. 
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D1 


DS 


D1 


D 3 





Mother 


Two bears gossiping. My mother loved to 
talk. 

Bears. Nose of the bear reminds me of my 
mother. 


Father and Mother 


Circus bears kissing and fighting. Like my 
parents. 


Card III 


Father 

Two men bowing and polite. My father 
was a tall lanky person like these. 

Men in tuxedos. Very formal like my 
father. 

Two men. Because these are men. 

Marionettes. My father was clever and did 
parlor tricks. 

Two men. My father had a long nose like 
these men. 


Mother 


Two women exercising. My mother always 
did because she was so afraid to get fat. 

Hands preying. Reminds me of my mother 
pleading. 


Card IV 


Father 
Clown. My father loved the circus. 
Bear skin. My father was an old bear. 
Something solid and impregnable and 
sticky. Looks masculine and short. 
Phallic Symbol. Because it’s masculine. 
Looks like a skull. My father is dead. 
Face. My father was a poet and wore his 
hair like that. 


Mother 


Dragon. Has a mean face like my mother. 

Hair wig. My mother had black curly hair. 

Something enveloping me. Like my mother. 
She used to stifle me with her control. 


Card V 


Father 
Bat. Dull and not much of a picture and 
not much of a father. 
Mother 


Huge wings. Wants to protect me as my 
mother did. 


Card VI 


Father 


Man with a bearded profile. My father was 
a physically powerful man although small 


D4 
D 6&7 


D 6&7 
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and yet he got steamrollered by my 
mother. I saw him es an animal with 
powerful shoulders and no head. My 
father had no head and couldn’t stand 
up to my mother. 

Fur. My father liked to hunt. 


Mother 


Fox skin. My mother was a fox-like char- 
acter. 


Card VII 


Father 


Clouds. My father’s favorite hymn was 
“Unclouded Day.” 

Two arm up. My father loved to expostu- 
late. 


Growling animal. Like my father. 


Mother 


Iceberg. My mother came to this country 
from the old country in a ship over the 
water and saw icebergs. 

Old women gossiping. 
women. 

Two women dancing. My mother as a girl 
was thin waisted. 

Pixies. My mother is like a pixie. 

Two nude women. My mother isn’t nude. 
(sic!) 

Two women. My mother is a woman. 


Because they’re 


Father and Mother 


Two lambs jumping. My parents are lambs. 

Rocks. My parents are dead; reminds me 
of cemetery tombstones. 

Church and two people. My parents passed 
on. 


Card VIII 


Father 
No response. Triangles and squares and 
colors. Balance between the systematic 
and colors like my father. 
Beavers. My father worked very hard. 
Two animals. My father was stern and 
that’s how these animals look. 
Two animals. Animals are tame and my 
father was good and kind and nice. 
Warrior’s head. My father was strict. 
Blood. My father died of bleeding in his 
throat. 


Mother 


Flowers. My mother loved flowers. 

No response. Something soft and has color. 

Flowers. She liked flowers. (Given by two 
patients.) 

Gladioli. My mother loves gladioli. 





Card IX 
Father 

D1 Men. Like my father. 

D 3 Men laughing. My father used to drink and 

loved to run off his mouth. 
No response. My father had auburn hair. 

Mother 

W No response. Because its cheerful like my 


mother. All bright colors. 

Sun breaking through the clouds. Calm and 
peaceful like my mother. 

Tropical flowers. She loved beauty. 

Flowers. All are pastel colors and she likes 
flowers and is soft. 


Card X 
Father 


D4 Sea horses. My father and I loved to walk 
on the beach end find small dead ones. 


Mother 


W Flowers. Because my mother liked flowers. 
No response. Utter confusion and lack of 
coordination like my mother, but a pleas- 
ing appearance. 
No response. Pastel colors remind me of 
my mother because she is fair. 


Dds 29 Face. Scornful like my mother. 


Discussion 


The results speak for themselves. No claim 
is made that there is a father or mother card, 
rather that, as Schafer points out, there are a 
variety of reasons against such an assump- 
tion. 

Summary 


This study is concerned with the present- 
day tendency toward interpretation of re- 
sponses to Cards IV and VII of the Ror- 
schach test as though they were symbolic rep- 
resentations of father and mother figures. The 
experimental evidence for such interpretations 
is sparse and there seems no convincing theo- 
retical proof for such assumptions. Further 
validation of determining whether such sym- 
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bolic meaning did exist was attempted by the 
writer. He found that when over fifty pa- 
tients were asked to select Rorschach cards 
which reminded them of their own parents 
they tended to use all ten cards in such man- 
ner that no distinction between Cards IV and 
VII and the other eight cards could be made. 


Received May 8, 1956. 
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The data presented in this paper are based 
upon a Q array designed to quantify indi- 
viduals’ assessments of themselves and assess- 
ments of the individuals by clinicians. These 
assessments were intended for analysis in con- 
junction with physiological measures obtained 
on the same individuals (1). It was decided 
in advance that the construction of the rat- 
ing scale should incorporate certain essential 
characteristics as follows: (a) Have satisfac- 
tory interjudge and repeat reliability. (&) Ap- 
proximate as nearly as possible the kind of 
operations that clinical assessors customarily 
perform in the course of their professional 
duties. Involved in this was the consideration 
that clinicians aspire to make discriminat- 
ing judgments about both intra-individual 
characteristics and interindividual differences. 
Also involved is the desire of the clinician to 
make global and configural evaluations. (c) 
Provide a framework of discourse that would 
be equally useful to clinical assessors from 
different disciplines assessing from independ- 
ent heterogeneous samples of behavior. (d) 
Bridge the gap between the clinical assess- 
ment and the self-description, and provide a 


1From the Veterans Administration Hospital, 
Seattle, and the Department of Psychiatry, Univer- 
sity of Washington School of Medicine. 

2We wish to express our appreciation to Theo- 
dore L. Dorpat and Glenn T. Strand for their par- 
ticipation as judges and for their suggestions and 
help in the early phases of this study. 
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basis for studying the nature of the relation- 
ships between other- and self-assessment. 

On the basis of these criteria, the Q array 
as described by Stephenson (9) was selected 
as the type of rating scale most appropriate. 
This paper will confine itself to describing 
two specific unanticipated problems which 
were noted after the Q array had been con- 
structed and data collection started. (a) The 
problem of the relationship between two vari- 
ables designated as Health-Sickness (emo- 
tional) and Social Desirability; (5b) The 
problem of confounding, i.e., the difficulty in 
ascertaining the specific effects of these vari- 
ables upon other variables in an assessment. 
The first issue examined is some of the char- 
acteristics of moral and social value judgment 
which penetrate personality assessment. The 
second, the fact that it is necessary to sort 
these value judgments out in order to make 
valid analysis of data based on personality 
assessments. 

Method 


There were two contrasting groups of sub- 
jects: (a) 24 adult male patients hospitalized 
for psychiatric treatment, who will be re- 
ferred to as the sick group; (5) 24 male uni- 
versity students screened for absence of no- 
table current psychiatric difficulties, who will 
be referred to as the well group.* 

8 Well group was paid in connection with a study 


supported by a grant from the Air Force School of 
Aviation Medicine (1). 
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The array* consisted of 96 items which 
sampled 25 personality variables. Since the 
personality variables were used solely as 
scores, their names will not be listed. 

For each assessment (both self and clini- 
cal) of each individual in either group, the 
following operations were performed: (a) 
The items, typed on cards, were sorted by 
the assessor in the following way. He was 
asked to describe the individual as best he 
could using the 96 items by first separating 
them into three piles, the left pile less charac- 
teristic, the middle pile indeterminate, the 
right more characteristic of the individual be- 
ing described. Then the assessor was in- 
structed to rank the cards further into a 
forced “normal” distribution as shown in 
Table 1. (6) Items ranked in this way were 
assigned a score ranging from one to eleven 
by the position in which they were placed in 
this forced “normal” distribution. (c) Each 
personality variable was represented by either 
three or four items which had been independ- 
ently classified by not less than five out of 
six clinical judges as sampling the named 
variable. A pool of 550 items had been sub- 
mitted to these judges for this purpose. Q 
array variable scores were obtained by aver- 
aging the scores of the several items sampling 
them. (d) Thus each assessment was reduced 
to 25 scores. These were the scores that were 
used to elucidate the variables, Health-Sick- 
ness and Social Desirability, with which this 
paper is concerned. 

The data presented are based upon four as- 
sessments of each individual in both groups. 
The assessments were performed in the order 
in which they are described: (a) First Self- 
Assessment, “describe yourself,” designated as 
(Sl); (8) Clinical Assessment by a psychia- 
trist after one or more diagnostic interviews, 
designated as Work-up Assessment (WA); 
(c) Clinical Assessment by a psychiatrist 
other than the one who performed WA, after 


*To save printing costs, the 96 items of the Q 
array have been deposited with the American Docu- 
mentation Institute. Order Document No. 5037 from 
ADI Auxiliary Publications Project, Photoduplication 
Service, Library of Congress, Washington 25, D. C., 
remitting in advance $1.25 for microfilm, or $1.25 
for photocopies. Make checks payable to Chief, Pho- 
toduplication Service, Library of Congress. 
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Normal Distribution of the Q-Array Items 
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a one-hour “stress” interview,’ designated as 
Stress Assessment (SA); (d) Second Self- 
Assessment, “Describe yourself as you think 
the doctor who interviewed you during the 
test thinks of you,” designated as Self-via- 
Doctor (Sv). 

The procedure produced 192 assessments, 
four for each individual in the sick group and 
four for each in the well group. The next set 
of operations reduced these 192 assessments 
to eight mean assessments, an average for 
each of the four types of assessment for each 
group as described above. This provided us 
with the characteristics which the assessors 
attributed, on the average, to these two 
groups. 

Operations performed to derive each mean 
assessment were as follows: (a) Each of the 
25 scores for a particular assessment (e.g., 
First Self-Assessment [Sl] for patients) were 
individually summed across all patients and 
divided by 24, the number of subjects in each 
group; (6) These mean scores are tabulated 
in their original order (25 scores in all) and 
are designated as Group Mean Assessment. 


Results 
Relationships Between Assessments 


Tables 2 and 3 show the intercorrelations 
(product moment) of Group Mean Assess- 
ments and provide a basis for the study of 
some of the characteristics of both the as- 
sessors and the instrument of quantification, 
the Q array. The fact that the size and di- 
rection of the correlations are stable when the 


5 Subject, supine with a polygraph recording ten 
simultaneous physiological measures, was interviewed 
on specific programed conflict areas by assessing psy- 
chiatrist. 
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two successive self-assessments are compared 
and when the two independent clinical as- 
sessments are compared provides indication 
of the reliability of the relationships. 

Table 2, under the heading “Correlation,” 
presents the relationships between the Group 
Mean Self-Assessments (Sl and Sv) with 
Group Mean Clinical Assessments (SA and 
WA) for the two groups separately. The cor- 
relations in Table 2 under the heading “HS 
and SD removed” will be referred to in the 
text below. 

The data referred to as “Correlation” in 
Table 2 indicate that while the well group 
describes itself as it is described by clinicians, 
the sick group’s self-description bears no sig- 
nificant relationship to how they are de- 
scribed by clinicians. 

Table 3, under the columns labeled “Cor- 
relation” presents the relationship between 
each of the Group Mean Self-Assessments of 
the sick group with those of the well group. 
These correlations indicate that there is con- 
siderable agreement between the way the sick 
and well groups describe themselves. 

Similarly, Table 3 also presents the cor- 
relations between the Group Mean Clinical 
Assessments of the sick group with the well 


Table 2 
Correlations* Between Each Group’s Clinical 
I 


and Self-Assessment 





Clinical assessments 





Partial 
correlation 
HS & SD 
Correlation removed 
Assessment and 929 ——————— —_—_—_—_— 
group SA WA SA WA 
Sick Group 
Self-Assessment 
Sl 04 —.01 54 57 
Sv —02 —.06 56 61 
Well Group 
Self-Assessment 
Sl 88 78 59 65 
Sv 92 78 73 62 





_* For this and subsequent tables,r = .40 and .51 significantly 
different from zero at 5% and 1% levels respectively. N = 25, 
@f = 23, two-tailed test. 
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Table 3 


Correlations of Self-Assessment and Clinical 
Assessment Between Groups 


Well group 


Partial 
correlation 
Assessment and HS & SD 
group Correlation removed 
Self-Assessment Sl Sv S} Sv 
Sick Group 
Sl 77 .76 66 60 
Sv BA 84 69 .67 
Clinical Assessment SA WA SA WA 
Sick Group 
SA -.19 03 59 65 
WA 18 01 73 62 


group. These correlations indicate that the 
average Clinical assessment of the sick group 
bears no significant relationship to the aver- 
age clinical assessment of the well group. 
The pattern of these correlations suggests 
that the sick group has less “insight” than 
the well group. The question still remains— 
insight with respect to what? With this ques- 
tion in mind, the work of Edwards (2) and 
others (5, 7, 8) with the problem of social 
desirability and its effect on the probability 
of endorsement of personality inventory items 
directed our attention to the effect of social 
desirability on our assessments. The articles 
document the strong trend on the part of an 
individual to ascribe behavior or attributes 
to himself which are socially desirable. This 
trend operated independently of whether 
other aspects of the behavior or attitude were 
accurate for the individual involved. 


Relationship Between Assessments and SD 
and SH 


In keeping with the nature of the groups, 
i.e., well-sick, we also became interested in 
exploring the effect of the variable of Health- 
Sickness upon the Q-array assessments, This 
further analysis led to some interesting re- 
sults. Measures of Social Desirability desig- 
nated SD and Health-Sickness designated 
HS, were constructed as follows: (a) Six 
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Table 4 


Correlation* of HS and SD with Group 
Mean Assessments 








Self Clinical 
assessment assessment 


WA 


Group and 
variable Sl Sv SA 


Sick Group 


HS 59 .68 
SD .67 71 





Well Group 


HS ; 87 81 65 
SD 8: 82 .76 53 





* Average correlations (z’ transformations employed) between 
individual assessments (both self and clinical) with HS and SD 
were also significantly different from zero at the 1% level of 
confidence. For example, for the sick group the following 
correlations were obtained in a different analysis of the data: 
SD with SI .33, SD with WA —.26; similarly for the well group: 
SD with SI .56, SD with WA .37. ‘These correlations were 
calculated by using item scores (96 items of the Q array) rather 
than the variable scores (25 variables). 


clinicians (psychiatrists and psychologists *) 
were asked to sort the Q array with respect 
to these variables, Health-Sickness and So- 
cial Desirability, as indicated in Table 1. 
Health and high social desirability were 


sorted at the high end of the continuum; 
(6) These two sets of sorts were reduced to 
25 scores as described above. 

The intercorrelations between the Q sorts 
of the judges for these two variables are as 
follows: HS Mean .82, Range .80 to .84; SD 
Mean .74, Range .68 to .82. This amount of 
agreement between the judges for these two 
variables was considered sufficient to warrant 
averaging the sorts. This was done and the 
resultant mean sorts are designated Mean 
Health-Sickness (HS) and Mean Social De- 
sirability (SD). These operations, in effect, 
weighted each of the 25 scores in the Q array 
for HS and SD respectively. The correlation 
between HS and SD was .89. 

Table 4 gives the correlations separately be- 
tween HS, SD, and the various Mean Group 
Assessments and indicates the high degree to 
which these two dimensions entered into the 
assessments. It should be noted that these re- 
lationships are true of both self and clinical 
assessments and raise some difficult problems 


® These raters included individuals who partici- 
pated in assessing subjects for the study. 


about assessment in general. To what extent 
are assessors, both self and observer, involved 
in what might be called cultural stereotyp- 
ing? In other words, to what extent do so- 
cial desirability and health-sickness determine 
the variance in assessments? Table 4 suggests 
that in our Q array too much of the variance 
derives from these dimensions and _ insuffi- 
cient amounts are being determined by vari- 
ables which describe personality; in other 
words, assessments may be based more on 
cultural stereotypes than on factors related 
to kinds of people. 


Partialling Out Effect of HS and SD 


The second problem is that of confounding. 
Edwards and Horst (3) in discussing social 
desirability as a variable in Q-array assess- 
ment studies have demonstrated mathemati- 
cally that the interpretation of the results ob- 
tained from Q arrays would be much clearer 
if this variable were controlled. The opera- 
tions performed below are an attempt to re- 
move the effects of HS and SD from the cor- 
relations already referred to in Tables 2 and 
3, and demonstrate with the data at hand the 
issues raised mathematically by Edwards and 
Horst. The effect of HS and SD upon the cor- 
relations in these tables was removed by the 
method of partial correlations as given by 
Garrett (6, pp. 433-434) for the problem of 
four variables. In terms of the subjects used, 
it was reasoned that HS is the pivotal vari- 
able and it was therefore partialled first. The 
reverse procedure was omitted. The correla- 
tions presented in Tables 2 and 3 under the 
appropriate heading, are those resulting from 
partialling out both HS and SD. The correla- 
tions produced by partialling out only HS 
were of the same order of magnitude as with 
both partialled out. For example: With ref- 
erence to the data for the sick group in 
Table 2, the mean partial correlation be- 
tween self and clinical sorts with HS alone 
partialled out is .54; with reference to the 
well group in Table 2, the mean partial cor- 
relation between self-sorts and clinical sorts 
is .63; for Table 3, the mean partial corre- 
lation between the two sets of self-sorts with 
just HS partialled out is .68; the mean par- 
tial correlation between the clinical sorts with 
just HS partialled out is .55. These facts to- 








gether with the high correlation between HS 
and SD would tend to substantiate the po- 
sition that HS and SD are essentially the 
same variable. 


Discussion 


From the foregoing and from an over-all 
examination of Tables 2 and 3, it is clear 
that with the removal of the effect of what 
might be usefully labeled as a cultural stereo- 
type, the pattern of correlations has changed 
distinctly.’ Examination of Table 2 indi- 
cates that for the sick group the partialling 
out of HS and SD from the correlations be- 
tween the self and clinical assessments, re- 
sults in a distinct shift. This shift from in- 
significant correlations to correlations of .54 
to .61 is from absence of significant relation- 
ship to a relationship accounting for about 
30% of the residual variance. The shift for 
the well group is in the opposite direction; 
i.e., from correlations accounting for 70% of 
the variance to ones accounting for about 
40% of the residual variance. 

Similarly, examination of Table 3 indicates 
that partialling out HS and SD from the cor- 
relation between the Self-Assessments of the 
two groups results in a small but consistent 
drop in the size of the correlations. The same 
operation for the correlations between the 
clinical assessments of the two groups re- 
sults in a shift from insignificant correlations 
to ones accounting for about 40% of the re- 
sidual variance. 

It would appear now, with the removal of 
the variance due to HS and SD, that both 
self-assessments and clinical assessments ac- 
count for approximately the same amount of 
residual variance for the two contrasting 
groups. Within the limitations imposed by 
the size of these correlations, an interesting 
hypothesis suggests itself; ie., that with HS 
and SD controlled, self-assessments might be 
substituted for clinical assessments in some 
respects. It would be necessary, of course, to 
ascertain specifically what accounts for the 
overlapping variance. 

Again, within the limitations imposed by 


‘We are unable to state the level of significance 
of the shift in the absence (to us) of a known for- 
mula to test the significance of the difference be- 
tween these correlations. 
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the size of the residual correlations, another 
hypothesis that might be entertained is that, 
on the whole, sick and well groups both con- 
tain overlapping or like “kinds” of people, 
that both groups have within them the same 
spectrum of characterological types, the prin- 
cipal differences between them being that they 
are sick and well. The proposition might be 
interesting with respect to the general theo- 
retical notions of the continuity of health 
and sickness as opposed to the idea of treat- 
ing them as discontinuous. 

It is not within the scope of this paper or 
the data on which it is based to examine these 
two hypotheses in greater detail, much less to 
test them. They are presented to accent the 
shift in the pattern of correlations obtained 
and concomitant potential difference in inter- 
pretation of results. However, even if the data 
were capable of these two goals, the opera- 
tions of partial correlation are quite cumber- 
some and inefficient. The Q array, to be a 
useful and efficient technique for the quantifi- 
cation of clinical assessments, must be ca- 
pable of controlling and giving directly scores 
referable to variables such as HS and SD, as 
well as other variables which are under study 
at the time. All relevant variables, if they are 
to be manipulated successfully, must be con- 
trolled at the point of experimental design. 
This issue has been discussed by Stephenson 
(9) in the construction and use of Q arrays 
but not with reference to what we have de- 
scribed in this paper as “cultural stereotypes.” 
Fordyce (4) has explored one type of solu- 
tion with reference to social desirability and 
a limited number of other variables. 


Summary 


1. A statement of the desirable character- 
istics of an instrument for quantifying clini- 
cal assessments is given. A description of the 
construction of such an instrument is out- 
lined. However, rather than evaluating the 
instrument in terms of its original intention, 
this paper confines itself to two problems 
which emerged as the data were being col- 
lected: (a) The degree to which total vari- 
ance of group comparisons was dominated by 
what appeared to be a single variable. (db) 
The problem of sorting out the variance at- 
tributable to this variable from others being 
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measured so they could be interpreted cor- 
rectly. 

2. The variable of Health-Sickness defined 
by the operations described in this paper 
seems indistinguishable at this point from 
the characteristic attribute of self-assessors 
described in several papers as social desir- 
ability. 

3. This variable in Q arrays should be con- 
trolled within the structure of the Q array. 
Failure to do so obscures the interrelationship 
of what we have described as “cultural stereo- 
types” and other variables in assessments 
within the particular problem studied. Sta- 
tistical separation of these variables is pos- 
sible but cumbersome and inefficient. 
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A Construct Validation of the Edwards Personal 


Preference Schedule with Respect to Dependency’ 


Alfred C. Bernardin and Richard Jessor 


Recent emphasis upon construct validity 
(5, 8, 10) in psychological tests has had 
several beneficial effects. It has removed the 
“criterion” from a position of unquestioned 
validity in comparison with the “test”; it 
has tended to strengthen the relationship be- 
tween tests and theories; and finally, by re- 
quiring coordination of both test and criterion 
to the construct, it has emphasized careful 
specification of the components or properties 
of test constructs. The latter, especially, has 
immediate implications for experimental at- 
tempts to validate a test. 

The present investigation grew out of an 
interest in defining the construct of depend- 
ency in such a way as to be useful in relating 
behavior on a psychometric personality test 
to behavior in several experimental situations 
developed in light of the properties of the 
dependency construct. Such relationships, if 
found, would constitute one source of evi- 
dence for the construct validity of the test or 
an aspect of the test. 

A review of the literature indicated consid- 
erable agreement upon what is meant by de- 
pendency. Three components were incorpo- 
rated into the present definition on the basis 
of this review and a consideration of available 
personality tests. Dependency was defined as 
including: (a) reliance on others for approval 
or importance of approval from others, (5) 
reliance on others for help or assistance, and 
(c) conformity to opinions and demands of 
others. 

The instrument employed in this research 
was the Edwards Personal Preference Sched- 


1 This article is a report of a Masters Thesis (2) 
completed by the first author under the-supervision 
of the second author. 


University of Colorado 





ule (PPS) (6), a recently devised inventory 
purporting to measure 15 personality needs 
originating from the work of H. A. Murray 
(9). The inventory has several desirable char- 
acteristics—it attempts to measure normal 
personality variables, it employs a forced- 
choice item form, and it has successfully 
minimized the role of social desirability in 
item choice (6, pp. 14-16). The PPS does 
not include dependency as one of the vari- 
ables measured, but two of the variables in- 
cluded in- the inventory appeared related to 
our definition of dependency. These two vari- 
ables are deference and autonomy, and they 
are defined by Edwards (6, p. 5) as follows: 

deference: To get suggestions from others, to find 
out what others think, to follow instructions and do 
what is expected, to praise others, to tell others that 
they have done a good job, to accept the leadership 
of others, to read about great men, to conform to 


custom and avoid the unconventional, to let others 
make decisions. 

autonomy: To be able to come and go as desired, 
to say what one thinks about things, to be independ- 
ent of others in making decisions, to feel free to do 
what one wants, to do things that are unconven- 
tional, to avoid situations where one is expected to 
conform, to do things without regard to what others 
may think, to criticize those in positions of au- 
thority, to avoid responsibilities and obligations. 


Although it would have been possible to 
utilize other PPS variables, e.g., succorance, 
it was decided to select Ss only in terms of 
these two. For purposes of the research, Ss 
were Classified as dependent if they scored at 
or above the 70th percentile on deference and 
at or below the 50th percentile on autonomy, 
with a minimum separation of 30 percentile 
points between the deference and autonomy 
scores for each S. The Ss were classified as 
independent if they scored at or above the 
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70th percentile on autonomy and at or below 
the 50th percentile on deference, with a mini- 
mum separation of 30 percentile points be- 
tween the deference and autonomy scores.? 

From an administration of the PPS to 520 
students at the University of Colorado, 55 
dependent and 55 independent Ss were se- 
lected. The mean separation between their 
scores on the two variables was approxi- 
mately 63 percentile points. From these 110 
Ss, persons were assigned at random to one 
of three experiments. 

In summary, then, this paper reports an 
attempt to validate experimentally the con- 
struct of dependency as a variable in per- 
formance on the PPS. 


Experiment | 


Problem. The hypothesis of this study was 
that dependent persons under conditions of 
negative verbal reinforcement (critical com- 
ments) would perform less efficiently on a 
finger-maze learning task than either inde- 
pendents under the same conditions or con- 
trol groups of dependents and independents 
receiving no negative verbal reinforcement. 
The reasoning was as follows. If dependent 
persons, as defined, are more reliant on others 
for approval or consider approval from others 
(especially authority figures such as E) more 
important than do independents, then disap- 
proval or criticism should be more frustrat- 
ing for them than for the independents who 
are less concerned with social approval. In 
the terms of the Child and Waterhouse for- 
mulations (3, 4), disapproval should elicit or 
instigate a greater number and variety of 
interfering responses—such as concern over 
self-adequacy, being liked, other’s evaluations, 
one’s own achievement—from dependents than 
from independents and hence lower the qual- 
ity of performance of the dependents as com- 
pared with the independents in a task requir- 
ing concentration, accurate recall, judgment, 
etc. 


2 All Ss in the study were required to have a con- 
sistency score (6, p. 6) on the PPS of at least 10. 
This is a score which enables evaluation of whether 
an S is responding to the items consistently or on a 
chance basis. The probability of a score of 10 or 
better by chance alone is .15. The percentiles are 
based on Edwards’ norms reported in (6). 


Subjects and procedure. Forty persons, 20 
dependents and 20 independents, were uti- 
lized. Ten dependents and 10 independents 
were randomly selected for the experimental 
condition (negative verbal reinforcement), 
and the remaining 10 dependents and 10 in- 
dependents were designated as control Ss. 


Each S was individually brought into the experi- 
mental room where he was seated and informed that 
he would be asked to learn a finger maze. The maze 
was constructed of raised welding rod fastened to a 
wooden base and consisted of 20 choice-points be- 
tween start and end. S was blindfolded and allowed 
to explore the maze from the starting point to the 
first choice-point for a period not exceeding twenty 
seconds. Instructions were read to him stating that 
he would have fifteen minutes to learn the finger 
maze and that the experimenter would keep track of 
both the number of errors which he made and the 
number of perfect runs through the maze. A stop 
watch was started and S’s progress through the 
maze was recorded by an assistant. When either the 
maze had been learned to criterion of three consecu- 
tive perfect runs or when fifteen minutes had elapsed, 
the S was stopped. All 40 Ss were run through this 
procedure in Experiment 1 but only the 20 Ss in the 
experimental condition were given negative reinforce- 
ments (critical verbal comments) during the experi- 
mental period. At intervals EZ made such comments 
as: “You’re going very slowly,” “Your performance 
is not very good,” “I thought you could do much 
better than this,” etc. In addition, E said “NO!” 
each time S made a blind alley entrance or retracing 
error. The 20 control subjects received no negative 
reinforcement while learning the maze. 


In order to assess quality of performance 
on the maze task, mean errors per run, mean 
time per run, and percentage of savings were 
calculated for each S. The S might make 
errors in three ways: (a) a blind-alley en- 
trance, (6) retracing the same path, and (c) 
vacillation—either standing still in one spot 
on the maze or movements back and forth 
on the paths around a choice-point (at any 
given choice-point where vacillation had oc- 
curred one error was recorded). Errors per 
run and time per run were computed on the 
basis of all runs subsequent to the first run. 
First-run error scores could be due largely to 
chance, since no Ss had prior experience with 
this particular maze. In none of the three ex- 
periments reported here did E have knowl- 
edge of whether a particular S was a depend- 
ent or independent. 

Results. Means and standard deviations on 
each of the three measures of quality of 
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Construct Validation of Edwards PPS 


Table 1 
Means and Standard Deviations for Dependent and Independent Experimental and Control Groups 


on Three Measures of Quality of Performance 

















Dependent Independent 
Measure ——- -———— — ——__ —_ 

Experimental Control Experimental Control 
E / Mean 2.598 1.826 1.540 1.600 
— SD 658 645 1.100 576 
Thea/ren Mean 981 .709 .762 .739 
SD .781 523 134 .660 

% savin Mean 67.56 84.53 80.47 86.80 

° és SD 4.36 5.49 3.33 5.23 








performance for each of our four groups— 
dependent experimental, independent experi- 
mental, dependent control, and independent 
control—are given in Table 1. 

Since the variance among groups on two of 
the three measures is heterogeneous, a non- 
parametric statistic, the Kruskal-Wallis H test 
(7), was employed to evaluate the data. An 
over-all H test was run for each of the three 
measures. Since the over-all H values were 
significant in each case, it was possible to use 
the H test to compare the four groups within 
each measure. The H values for the relevant 
cell comparisons are presented in Table 2. 

An examination of Table 2 indicates that 
the hypothesis for Experiment 1 is supported. 


Table 2 


H Tests Among Relevant Cells on Each of the Three 
Measures of Quality of Performance 











As 
pre- 
Measure Groups Hvalue plevel dicted 
DEvsIE 4.010 OS Yes 
— oe DEvsDC 4.802 04 Yes 
2 IE vs IC 113 notsignif. Yes 
DC vs IC 568 notsignif. Yes 
DEvsIE 5.139 03 Yes 
Time/run DEvsDC 6.219 02 Yes 
IE vs IC 2.282 notsignif. Yes 
DC vs IC 820 notsignif. Yes 
DEvsIE 4.318 05 Yes 
% savin DEvsDC 7.402 01 Yes 
osavingS = IEvsIC 1.848 notsignif. Yes 


DC vs IC 820 notsignif. Yes 





Dependent Ss under conditions of negative 
verbal reinforcement made significantly more 
errors per run, took significantly longer per 
run, and showed significantly less percentage 
of savings than independents under the same 
conditions. Quality of performance for de- 
pendent experimentals was significantly low- 
ered compared with dependent controls. No 
difference in quality of performance as a func- 
tion of negative reinforcement appeared be- 
tween independent experimentals and inde- 
pendent controls. 


Experiment 2 


Problem. The hypothesis of this study was 
that dependent persons confronted with a 
difficult problem-solving task will request help 
significantly more often than independent per- 
sons, when both groups are informed that as- 
sistance may be gotten upon request. This 
study was an attempt to elicit direct referents 
(requests for help) of one of the properties of 
the dependency construct (reliance on others 
for help) and contrasts with the indirect test 
of another property (reliance on others for 
approval) reported in the preceding study. 

Subjects and procedure. Twenty dependent 
Ss and 20 independent Ss constituted the two 
groups in this experiment. 

The task employed was a difficult Chinese block 
puzzle consisting of 11 pieces which when assembled 
formed a 24-inch cube. All Ss were told that they 
would be asked to solve a difficult block puzzle 
within a 15-minute period. They were further in- 
formed that, because of the unusual difficulty of the 
task, E would be willing to give them as much help 
as they felt they needed, and that whenever help was 
requested, E would place the next piece of the block 
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puzzle in place. All Ss worked for a 15-minute pe- 
riod, during which time E, by means of a concealed 
counting mechanism, kept track of two scores. These 
two scores are referred to as the suggestion score and 
the corroboration score. The suggestion score was the 
number of direct requests an S made that E put the 
next piece of the block puzzle in place. The sugges- 
tion score is, then, the record of the number of times 
an S directly requested help on the block-puzzle task. 
The corroboration score was the number of times an 
S made comments asking for reassurance, eg., “Is 
this correct?” “I’ve got it now, haven’t I?” etc. 
This score was considered an indirect measure of the 
tendency to rely on others for help. Only one S was 
able to complete the assembly of the puzzle within 
the allotted 15 minutes, and his data were discarded 
since it had been decided in advance to equate time 
spent on the puzzle. 


Results. Since the variances were heteroge- 
neous for both the suggestion-score and cor- 
roboration-score data, the results were evalu- 
ated by use of the H test. Table 3 indicates 
that on both scores the dependents, as pre- 
dicted, were significantly higher than the in- 
dependents. 

Another way of analyzing these data was 
employed. The median was determined for 
each of the two scores separately. Chi-square 
tests were then applied to evaluate the dis- 
tribution of dependent and independent Ss 
above and below the median for each score. 
The chi-square value for the suggestion score 
was 10.0; the chi-square value for the cor- 
roboration score was 19.6. Both of these 
values, for 1 degree of freedom in each case, 
are significant at well beyond the 1% level 
of confidence. 

The data in this study support the hypothe- 
sis that dependent Ss in a difficult problem- 
solving situation where assistance is available 
will ask for help more often than independent 
persons. 


Table 3 


Differences Between Dependents and Independents in 
Both Suggestion and Corroboration Scores 














Suggestion Score Corroboration Score 
Inde- De- Inde- De- 
pendents pendents pendents pendents 
Mean 95 3.50 1.45 10.90 
SD 1.24 2.31 1.63 7.53 
H value 12.42 22.98 
p level <.01 <.01 





Experiment 3 


Problem. The hypothesis investigated in 
this study dealt with a third component of 
the dependency construct. It was hypothe- 
sized that in a situation requiring perceptual 
judgments to be made before a group, de- 
pendent Ss will conform more to the judg- 
ments of the group than will independent Ss. 

Subjects and procedure. Fifteen dependents 
and 15 independents were run through a pro- 
cedure very similar to that employed by Asch 
(1). 

The Ss were asked to judge whether, of two lines 
drawn on a card, the left line was longer, shorter, or 
the same as the line on the right. Sixteen judgments 
were obtained from each experimental S in a situa- 
tion with seven to nine other persons present, these 
other persons having announced their judgments 
ahead of the experimental S on each of the trials. 
Since the rest of the group had been instructed by 
E to give objectively incorrect judgments for 13 of 
the 16 cards, it was possible to evaluate the degree 
to which an experimental S conformed to the judg- 
ments of the group. Since the variable line differed 
in length from the 2-inch standard by 4 to #4 inch 
plus and minus, it was also possible to determine 
whether dependents differed from independents in 
the line-length difference at which conformity failed 
to occur. 


Results. It was apparent by inspection that 
no differences on either measure of conformity 
appeared between the dependent and inde- 
pendent Ss, and it was therefore not possible 
to reject the null hypothesis. It should be 
noted, however, that Asch’s general results 
(1) were supported since approximately 60% 
of the Ss—both dependents and independents 
—exhibited conformity behavior. 


Discussion and Conclusions 


The aim of the present study was to in- 
vestigate the degree to which a construct of 
dependency could mediate the relationship be- 
tween test behavior and behavior in several 
experimental situations. In this sense, the re- 
search reported constitutes a construct valida- 
tion approach to certain aspects of the PPS. 
The results of the research are construed as 
contributing to the construct validity of the 
autonomy and deference scales of the PPS. 

Some of the problems of the present re- 
search might well be noted. With respect to 
Experiment 1 the hypothesis employed in- 
volves reliance on the concept of interfering 
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responses as the basis for the lowered quality 
of performance of dependents under negative 
verbal reinforcement. The study, however, 
provides no direct measure of these interfer- 
ing responses and their explanatory role re- 
mains completely hypothetical. It may, that 
is, be possible to account for the obtained re- 
sults within a different formulation than was 
employed here. 

Blindfolding the Ss may have, in addition 
to the negative comments, contributed to the 
lowering of performance level since it is pos- 
sible that dependent persons rely more heavily 
upon visual cues of social reactions than do 
independents. Elimination of visual cues may 
have been more disrupting for the dependents 
than for the independents. There is no way 
of analyzing out this effect which, of course, 
would operate in the direction of the hy- 
pothesis. 

It is possible to point to at least one major 
factor which may have been responsible for 
the negative results of Experiment 3. Unfor- 
tunately the situation was so constructed that 
conformity to the group (which gave 13 out 
of 16 objectively incorrect judgments) was 
attainable only at the expense of disagreeing 
with a fairly objective reality situation. A 
less structured situation where the correct re- 
sponse is less apparent to the S might be suc- 
cessful in differentiating dependents from in- 
dependents. It is possible, for example, to 
have the task consist of ambiguous colors, 
e.g., blue-green, to be named. Another factor 
which may have been involved was the status 
relation of the S$ to the group. Possibly the 
use of group members of higher status than 
S, e.g., graduate students and instructors, 
might have elicited differential conformity be- 
havior from the dependents and independents. 

With respect to the over-all aim of the re- 
search—the establishment of a useful con- 
struct in relation to psychometric test behav- 
ior—it would be worth while to demonstrate 
that the three properties of dependency are 
correlated within persons. It was simply not 
feasible to utilize the same S in several ex- 
perimental situations. Further research in 
which this is achieved would provide more 
direct evidence of the utility of the depend- 
ency construct as defined. 


Summary 


The present study is essentially a construct 
validation of certain aspects of the Edwards 
PPS related to the construct of dependency. 
Three properties of dependency were speci- 
fied—treliance on others for approval, reli- 
ance on others for help, and conformity to the 
opinions and demands of others. Persons were 
selected as dependent who scored high on 
deference and low on autonomy on the PPS. 
Independents were defined by high scores on 
autonomy and low scores on deference. Three 
experiments were conducted, each one to 
measure a different property of dependency, 
and a total of 110 Ss were involved. The re- 
sults supported hypotheses relating to the 
greater reliance of dependents on others for 
approval and for help. No differences were 
found between dependents and independents 
in group conformity. 

On the whole the research serves to con- 
tribute to the construct validity of the au- 
tonomy and deference scales of the PPS and 
indicates the possible utility of the PPS for 
research studies in personality. 


Received May 18, 1956. 
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Performance of High School Students on the 
Edwards Personal Preference Schedule’ 


C. James Klett 


University of Washington 2 


The Edwards Personal Preference Schedule 
(EPPS) (3) purports to measure fifteen rela- 
tively independent normal personality vari- 
ables drawn from a list of manifest needs de- 
scribed by Murray (5). An important feature 
of the EPPS is the attempt to minimize the 
tendency of subjects to respond in the so- 
cially approved direction by pairing items 
pertaining to differing needs but having simi- 
lar social-desirability scale values and pre- 
senting them to the subject in a forced choice 
format. Edwards (3), in his normative work 
on a college population, found that the inter- 
correlations of the EPPS variables were, in 
general, quite low, and that the fifteen meas- 
ures demonstrated satisfactory split-half and 
test-retest reliability. 

Since the EPPS has potential pplication 
to groups other than the college population 
upon which it was standardized and is, in 
fact, already in use in a variety of applied 
settings such as mental hygiene clinics, hos- 
pitals, and industry, further normative work 
would seem required. This investigation ac- 
cumulated normative data applicable to a high 
school population while at the same time 
studying the operation of several other vari- 


1 This study was a portion of a doctoral disserta- 
tion completed at the University of Washington, 
1956. Thanks are extended to Dr. Allen L. Edwards, 
who was of invaluable assistance during all stages of 
this study. The public school officials of King County 
were generous in their assistance, and special thanks 
are due to Mr. Hofstetter, Leonard Johnson, and 
Paul McCurdy. Shirley Klett, Michael Goldstein, 
William Crow, and Cliff Lundeborg assisted in judg- 
ing and calculations. Thanks are due to the Marchant 
Calculating Co. of Seattle for the loan of a calcu- 
lator. 

2Now at VA Hospital, Northampton, Massachu- 
setts. 
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ables on EPPS performance. On the basis of 
previous research (1), a variable considered 
to be of primary importance was that of so- 
cial-class membership. 


Method 


Subjects. The EPPS was administered in 
two King County high schools outside the city 
of Seattle, Washington. High School A had 
an enrollment of 568 students and was lo- 
cated in an outlying town in the county of 
some 6,600 population. High School B had an 
enrollment of approximately 1,850 students 
and was located in an expanding residential 
suburban area of the city of Seattle. 

Of the total number of students in average 
daily attendance in the two schools, 1,907 
were tested, yielding a total of 1,638 com- 
plete and scorable records.* There were, in 
the normative sample, 188 boys and 206 girls 
from High School A and 616 boys and 628 
girls from High School B. The age range of 
the group was from 14.5 to 20 years, with 
the mean age of the boys 17.1 and that of the 
girls 16.9. In grade placement, there were 664 
sophomores, 560 juniors, and 414 seniors, al- 
most equally divided between boys and girls 
within each grade. IQ scores* were available 
for 1,521 subjects, and these ranged from 66 


8 During the major part of the IBM processing, 
five senior boys in High School A were inadvertently 
omitted. Most of the normative data, then, is based 
on an N of 1,633. 

* California Test of Mental Maturity. A few of 
the subjects had only Otis Gamma scores, but a cor- 
relation of the Otis Gamma and the California Test 
of Mental Maturity for 1,087 subjects proved to be 
81 and was felt to be high enough to justify the 
establishment of conversion tables so that California 
equivalents could be obtained for the remainder of 
the subjects. 
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Table 1 


Distribution of Socioeconomic Status (SES) of 
Subjects in the Normative Sample 








Frequency 
Socioeconomic group 





Boys Girls Total 





1 Professional 77 71 148 
2 Proprietors, managers, officials 121 123 244 
3 Clerical and kindred 111 117 228 
4 Skilled 272 38282 554 
5 Semi-skilled 145 178 322 
6 Unskilled 52 44 96 
7 Unemployed, welfare, relief 26 19 45 

Total 804 834 1638 





to 146 with the mean for the boys at 107.2 
and that for the girls at 106.8. 

Each student supplied his age, grade, and 
a description of his father’s occupation on a 
questionnaire accompanying the testing. From 
the description of father’s occupation, clas- 
sification according to socioeconomic status 
(SES) was made utilizing the tables devel- 
oped by the U. S. Bureau of the Census (2). 


Table 2 


Means and Standard Deviations of the EPPS Variables 
for the High School Normative Sample 











Standard 
Mean Deviation 
Variable ——_———_—— — - 
Boys’ Girls Boys Girls 
1. Achievement 13.88" 11.13 4.01 4.06 
2. Deference 11.38 11.81 3.47 3.55 
3. Order 10.74 10.68 4.05 4.14 
4. Exhibitionism 15.40* 14.93 oak. ae 
5. Autonomy 14.57* 11.89 4.38 4.20 
6. Affiliation 15.28 17.94" 3.74 3.80 
7. Intraception 13.13 15.87* 449 4.27 
8. Succorance 11.03 12.76* 449 4.50 
9. Dominance 13.96* 11.99 4.33 4.39 
10. Abasement 14.35 17.66* 459 4.61 
11. Nurturance 14.12 17.35* 4.58 4.30 
12. Change 17.12 18.09* 4.11 4.15 
13. Endurance 13.81* 11.96 5.11 5.10 
14. Heterosexuality 17.31* 14.39 7.04 6.98 
15. Aggression 13.88* 11.43 4.39 4.19 
Consistency score 10.81 11.68* 2.06 1.81 

N 799 834 





* This mean is significantly larger at the one per cent level 
than the corresponding mean for the opposite sex. 


This classification provides for six categories, 
ranging from professional to unskilled work- 
ers. For the purposes of this study, a seventh 
category was also utilized to include welfare 
cases. The assignment of SES was made by a 
panel of three judges. Reclassification of 100 
randomly selected cases by two independent 
judges yielded reliabilities of .93 and .92. 
The distribution of assigned SES appears in 
Table 1. 

Administration. Administration was carried 
out during regular class periods by the class- 
room teacher, the testing program extending 
over two days as some of the classes were too 
short to allow all to finish. Two class periods 
proved more than ample for completion of the 
test. Absentees on either of the two days ac- 
counted for the majority of the unusable 
records. 


Results 


The data were treated separately by sex 
and by high school. Comparison of the means 


Table 3 
Comparison of Means of EPPS Variables in High 


School (H.S.) and College Normative Groups 


Mean 


Variable Boys Girls 


H.S. College H.S. College 


13.88 


15.66* 11.13 


1. Achievement 13.08* 
2. Deference 11.38 11.21 11.81 12.40* 
3. Order 10.74 10.23 10.68 10.24 
4. Exhibitionism 15.40* 14.40 14.93* 14.28 
5. Autonomy 14.57 14.34 11.89 12.29 
6. Affiliation 15.28 15.00 17.94" 17.40 
7. Intraception 13.13 16.12* 15.87 17.32* 
8. Succorance 11.03 10.74 12.76 12.53 
9. Dominance 13.96 17.44* 11.99 14.18* 
10. Abasement 14.35* 12.24 17.66* 15.11 
11. Nurturance 14.12 14.04 17.35" 16.42 
12. Change 17.12* 15.51 18.09* 17.20 
13. Endurance 13.81* 12.66 11.96 12.63* 
14. Heterosexuality 17.31 17.65 14.39 14.34 
15. Aggression 13.88* 12.79 11.43* 10.59 
Consistency score 10.81 11.53* 11.68 11.74 
N 799 760 834 749 





Note.—Complete college norms may be found in Edwards 


* This mean is significantly larger at the one per cent level 
than the corresponding mean for the other normative group. 
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Table 4 


Comparison of Split-Half* Reliabilities on the EPPS 
Variables for a High School and College Group 








Reliability Coefficient* 





Variable 





High 
School College 
1. Achievement 53 74 
2. Deference 55 .60 
3. Order .68 74 
4. Exhibitionism 55 .61 
5. Autonomy 72 .76 
6. Affiliation .60 .70 
7. Intraception 72 79 
8. Succorance 72 .76 
9. Dominance 73 81 
10. Abasement 84 84 
11. Nurturance 70 78 
12. Change .70 79 
13. Endurance 78 81 
14. Heterosexuality 91 87 
15. Aggression 82 84 
N 1638 1509 





* Split-half, based on 14 items against 14 items, Spearman- 
Brown correction applied. 


obtained by the boys from the two schools 
revealed only one mean difference significant 
at the 1 per cent level—a higher mean for the 
achievement variable being found for boys 
in High School B. Comparison of the girls in 
the two schools revealed deference and abase- 
ment scores to be significantly higher in High 
School A, and autonomy, heterosexuality, and 
aggression scores to be significantly higher in 
High School B. 

The data for the two schools combined are 
given in Table 2 for boys and girls. The vari- 
ables on which one of the sexes obtained a 
significantly higher mean score than the other 
sex are indicated in the table. 

When the means of the high school groups 
were compared with those of the college 
population (3), many significant differences 
were found. These results are summarized in 
Table 3. 

Intercorrelations of the EPPS variables, 
utilizing the combined sample of 1,633 cases, 
were computed and found to be uniformly 
low, the highest being .51 between nurturance 
and affiliation, — .44 between succorance and 
endurance, and — .40 between achievement 
and nurturance. They ranged slightly larger 


than those reported by Edwards (3), but 
tended to fall within plus and minus .20 as 
do those of Edwards. 

The split-half reliability coefficients for the 
EPPS variables were computed by correlat- 
ing the row score and column score for each 
variable for 1,638 cases and correcting by 
the Spearman-Brown formula. The reliability 
coefficients, given in Table 4, ranged from .53 
to .91. For purposes of comparison, the sec- 
ond column of Table 4 shows the reliability 
coefficients obtained by Edwards (3) for a 
college population, which ranged from .60 to 
87. 

The correlations of the EPPS variables 
with IQ, grade, age, and SES are shown in 
Table 5. 

In order to investigate further the relation- 
ship between SES and EPPS scores, a fac- 
torial design was utilized involving two levels 
of IQ, designated high (H) and low (L), sex, 
and the six major classifications of SES—24 
groups in all. With 10 subjects in each group, 
an analysis of variance was conducted for 
each of the 16 EPPS variables. A summary 
of these 16 analyses appears in Table 6. 


Table 5 


Correlations of IQ, Age, Grade, and SES 
with the EPPS Variables 








Correlation 








Variable 

IQ Age Grade SES* 

1. Achievement 244 —-03 —-01 —.12 
2. Deference —.17 O01 —.02 04 
3. Order —.20 -—-0O1 —.04 02 
4. Exhibitionism 11 -—-04 -01 —.06 
5. Autonomy 08 05 04 —.02 
6. Affiliation —03 —.03 01 .00 
7. Intraception —.02 02 04 —01 
8. Succorance —06 -—07 —.05 04 
9. Dominance 15 O01 04 —.14 
10. Abasement —14 -—-06 —.08 .06 
11. Nurturance —.12 00 00 08 
12. Change 02 03 .06 05 
13. Endurance —.10 04 —01 O01 
14, Heterosexuality —-12 -0 -01 —.03 
15. Aggression —.24 -—-18 -—08 —.08 
Consistency score 19 —.06 02 —.04 

N 1521 1633 1633 1633 





* Negative correlation indicates positive relationship as SES 
becomes lower as size of number increases. 
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Table 6 


Summary of Analyses of Variance of the EPPS 
Variables with Direction of Results 
Shown for IQ and Sex* 








Significance Level 





Variable IQ Sex 


H L B G 





oA) 
_ 


. Achievement 
. Deference 5 

. Order 1 

. Exhibitionism 1 

. Autonomy 1 1 
. Affiliation 1 

. Intraception 
. Succorance 
. Dominance 1 5 
. Abasement 1 1 

. Nurturance 1 

. Change 

. Endurance 1 

14. Heterosexuality 

15. Aggression 1 


se 
MOC MNIAULWNHE 
we 


—_— 
wn 


Consistency score 1 5 





* Interaction results discussed in text. 


None of the simple interactions proved to be 
significant, but the triple interaction of IQ x 
sex X SES produced differences at the 1 per 
cent level for affiliation and at the 5 per cent 
level for exhibitionism. It was found that level 
of IQ was significantly related to a number of 
EPPS variables, but that differential class 
membership as here defined was much less 
pervasive in its effect. 


Discussion 


A comparison of the high school and col- 
lege data clearly indicates that separate nor- 
mative standards should be applied to the two 
groups. Norms by sex were found to be neces- 
sary for the high school group as was also the 
case for the college group. It will be noted, 
from an examination of Table 3, that approxi- 
mately the same sex differences occur in the 
high school group as in the college group. The 
normative data that have been collected can 
be used in the interpretation of test scores ob- 
tained by high school students to the extent 
that the obtained sample can be considered 
representative of the high school population 


to which it might be applied. Psychologists 
who wish to utilize these new norms should 
consult the foregoing description of the pres- 
ent sample in terms of IQ, sex, age, and grade 
to determine their applicability. A copy of 
the percentile norms as derived from this 
study is obtainable from the publisher’s of 
the EPPS (3). 

The distribution of socioeconomic status 
could be expected to vary considerably from 
school to school but, in view of the low rela- 
tionship between SES and the various EPPS 
scores, this would not seem to be of concern 
in the use of the normative tables. 

The relative independence of the EPPS 
variables as revealed by the intercorrelations 
confirms Edwards’ findings. The reliability 
coefficients are, in some cases, less satisfac- 
tory than those of Edwards (3) but are gen- 
erally comparable to them. 

The results obtained for high school and 
college groups and for boys and girls lend a 
good deal of face validity to the EPPS vari- 
ables in the sense that the differences found 
are consistent with expectation. The inter- 
ested researcher should find an examination 
of these results fruitful for the development 
of hypotheses for further research with the 
EPPS. 


Summary and Conclusions 


The goal of the present study was to col- 
lect normative data on a high school popula- 
tion in the course of investigating perform- 
ance on the Edwards Personal Preference 
Schedule (EPPS) (3) as a function of IQ, 
grade, age, sex, and differential social class 
membership (SES). 

Normative data were collected from two 
King County, Washington, high schools which 
are believed to constitute an adequate refer- 
ence group. A total of 1,633 subjects was 
used in the establishment of high school 
norms, which are obtainable from the pub- 
lisher of the EPPS (3). The reliabilities of 
the EPPS variables and their means and 
standard deviations are presented for the high 
school group. 

Testing the effect of socioeconomic status 
on the EPPS variables failed to establish that 
this variable was profoundly related to the 
test scores. Only two of the sixteen EPPS 
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variables, autonomy and dominance, were 
found to be significantly related to socioeco- 
nomic status when IQ was controlled in an 
analysis of variance design. Significant corre- 
lations were obtained between socioeconomic 
status and other EPPS variables, but the co- 
efficients are low enough to justify the exclu- 
sion of this variable from practical considera- 
tion in the interpretation of the EPPS scores. 

The significant differences between various 
groups on EPPS scores lend considerable face 
validity to the needs as defined by Edwards. 


Received September 5, 1956. 
Early Publication. 
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Simulation of “Normalcy” by Psychiatric Patients 
on the MMPI’ 
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Studies on simulation of normal or abnor- 
mal adjustment on psychological tests have 
been conducted by a number of techniques 
including the Rorschach (1, 2, 4, 11), sen- 
tence completion test (9), and objective per- 
sonality schedules (3, 6, 7, 8, 10). However, 
studies on simulation of “normalcy” by psy- 
chiatric patients are, to our knowledge, non- 
existent. Such studies would appear to be 
justified by the following considerations: they 
may throw light on the concepts of normal 
adjustment which are held by different kinds 
of psychiatric patients; they may yield im- 
portant implications for psychotherapy, based 
upon the nature of the differences between 
the patient’s self concept and his ego ideal, 
as inferred from differences between his origi- 
nal and simulated test performances; they 
may prove of value in predicting length of 
hospitalization or outcome of therapy, since 
the ability to “improve” on the tests may re- 
flect a degree of reality orientation or ego- 
strength suggestive of a favorable prognosis. 

The present paper, which is part of a larger 
study involving tests tapping different levels 
of personality, presents preliminary findings 
based on the Minnesota Multiphasic Person- 
ality Inventory (MMPI). In undertaking this 
investigation, answers were sought to the fol- 
lowing questions: 


1 The writers are indebted to W. L. Martinsen and 
the staff of the Medical Illustrations Laboratory for 
the figures and slides which they prepared; and to 
Saul Kupferman and Raymond Anderson for con- 
ducting some of the testing sessions. 

2 Formerly clinical psychology trainee at VA Neu- 
ropsychiatric Hospital, Los Angeles. 


1. To what extent can psychiatric patients 
produce “normal” test performance when re- 
quested to do so? 

2. What other kinds of changes in test per- 
formance occur; and are these different for 
patients of different diagnostic categories? 

3. May ability to give improved perform- 
ance when simulating normalcy be predictive 
of shorter hospitalization? 


Procedure and Results 


The experimental procedure was as follows: 
Patients took the MMPI routinely, on enter- 
ing the hospital. The next day the test was 
repeated, but with instructions to answer it 
“the way a typical, well-adjusted person on 
the outside, would do.” Upon completing the 
test, each patient was asked to describe how 
he did it, what his method was; and his com- 
ments were noted. Forty-five consecutively 
hospitalized male patients participated in the 
study. In each case, the psychiatric admis- 
sions-diagnosis was obtained from the pa- 
tient’s clinical folder. 

In order to see how many patients actually 
improved their performance, the authors in- 
dependently made blind sortings of the pairs 
of original and simulated profiles for each 
patient, based on the expectation of improve- 
ment in the simulated profile. Where both 
investigators correctly sorted the patient’s 
profiles, and where this coincided with a re- 
duction in the Total T score based on the 
sum of the nine clinical scales, the case was 
considered as improved. On this basis, 33 out 
of the 45 cases (73%) were judged as im- 
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Fig. 1. Mean profiles on original and on simulated 
performance, based on 33 “improved” cases. 


proved in performance when simulating nor- 
malcy. The rest either did not improve, or 
became worse. Figure 1 shows the change in 
the over-all pattern for these 33 improved 
patients. 

The changes in the direction of the F and 
K scales are of interest, the patients showing 
increased defensiveness, K and decreased con- 
fusion, F. (If the F scale is considered simply 
as a validity indicator, the simulated perform- 
ance would appear to be more valid than the 
original!) It will be noted that the profile 
changes from that of a highly disturbed in- 
cipient paranoid schizophrenic to the double- 
spike curve of the “anxiety-free psychopath.” 
It is as though the typical, disturbed schizo- 
phrenic patient were to say, “If only I could 
lose my anxiety and guilt feelings, and my 
feelings of personal inadequacy, and if I felt 
less inhibited in accepting and acting on my 
impulses, then I would not be the harrassed, 
mentally-ill person that I am.” 

Statistical analysis of the differences in the 
profiles of the nine clinical variables was un- 
dertaken, based on a method recently devised 
by Gengerelli and Butler (5). This method, 
which takes into account the absolute magni- 
tude of each individual’s score on each vari- 
able, revealed the profiles to be significantly 
different at beyond the one per cent level of 
confidence, with a ¢ value of 5.4 for 32 de- 
grees of freedom.® 


8 Two profile-numbers were computed for each in- 
dividual (one representing his original performance 
and the other his simulated performance) by multi- 
plying his standard T score on each variable by 
constants according to the following equation: Pi 
=8.11(Mf) +6.13(Ma) +4.19(Hy) +2.11(Hs) + 
O(Pa) —2.23(Pt) —4.21(Pd) —6.17(D) — 8.21(Sc). 
The resulting pairs of profile numbers were treated, 
with due regard to signs, as two separate distribu- 
tions. Means and variances were computed and the 
usual ¢ test made. 


The kinds of diagnostic changes actually 
produced in the profiles of all 45 patients, im- 
proved and unimproved, are shown in Table 1. 
The table reveals that 28 of the 45 cases 
(62%) showed no change in diagnostic cate- 
gory, although many of these showed im- 
provement in terms of a “softening” or re- 
duction in the deviancy of the personality 
pattern. Only five cases (11%) became “nor- 
mal.” The remaining 27 per cent underwent 
a “diagnostic shift” to another category. 

Of the 24 schizophrenics, exactly half re- 
mained unchanged, while the rest converted 
to character disorders, psychoneurotics, psy- 
chosomatic, and “normal.” Of the 13 charac- 
ter disorders, 11 remained unchanged, while 
two became “normal.” Of the four manic-de- 
pressives, two remained unchanged; one be- 
came schizophrenic; and one a character dis- 
order. And of the four psychoneurotics, three 
remained unchanged although less deviant, 
while one gave a schizophrenic profile. 

Thus, although in general the simulated 
profile was better than the original, most pa- 
tients, under the conditions of the experiment, 
did not produce a “normal” performance, but 
rather changed the degree of severity or the 
nature of the behavior disorder. Figures 2 and 
3 show examples of two kinds of changes 
which occurred. Space considerations preclude 
the showing of other interesting examples. 

Figure 2 illustrates a “softened” deviancy 


Table 1 


“Diagnostic Shifts” Between Original and 
Simulated MMPI Profiles 








Num- 

Original diagnosis Simulated diagnosis _ ber 

(Schizophrenia 12* 
| Character disorder 6 
Schizophrenia (NN = 24) { Psychoneurosis 2 
Psychosomaticsyndrome 1 
Normal 3 

Character disorder {Character disorder i1* 
(N= 13) | Normal 2 

; : {Manic depressive a 
aa 7 Schizophrenia 1 
= Character disorder 1 

Psychoneurosis Psychoneurosis 3* 
(N=4) Schizophrenia 1 





* Diagnosis remained unchanged. 
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Fig. 2. An example of a “softened” deviancy pattern. 


pattern, and indicates a diagnostic shift from 
an incipient schizophrenic reaction to one 
of anxiety neurosis. Figure 3 illustrates a 
“mirror image” or reaction-formation pattern 
which sometimes results when the patient 
strives hard to deny unacceptable feelings or 
impulses. This figure exemplifies a diagnostic 
shift from a Jjecompensating obsessive-com- 
pulsive neurosis to an acting-out type of 
character disorder. 


The patients’ comments were frequently of inter- 
est, as evidenced by the following examples: “It was 
pretty hard at first but then you just think: I’m 
Superman, there’s nothing wrong with me.”; “Well, 
I was thinking of my dad, for example. He’s always 
been my ideal.”; “Well, I just put down the opposite 
to what I did yesterday.”; “Through books and 
things you get to understand the average person.”; 
“T have answered these questions as a so-called nor- 
mal person would (if there is such a person). What 
is a normal person?”; “I answered these questions 
as I hope to answer truthfully in the near future.” 


The degree and nature of the changes have 
possible diagnostic and therapeutic implica- 
tions. For example, in some cases, especially 


T Score 


95 


L 


F 


. 


Hs O 


Hy Po we 


Po 


Pr 


in the character-disorder class, the similarity 
of the two profiles, along with the verbaliza- 
tions, indicates that the patients do not feel 
there is anything wrong with them. As one 
patient put it, “I am well adjusted. I have 
no nervous problems.” Other patients, by the 
very mild softening of their deviant profiles, 
express that they are essentially normal. In 
the words of one patient, “What would you 
say if I told you I wouldn’t have to change 
mine a whole lot to be typically well ad- 
justed?” Others feel quite hopeless. For ex- 
ample, one patient said, “How can we an- 
swer as a typical, well-adjusted person if 
we're not?” The cases of “diagnostic shift” 
suggest that, for some patients, being normal 
involves the use of different (usually less 
deviant) adjustment patterns, including the 
elimination of bizarre symptoms. 

There was a significant reduction in Total 
T score for all 45 patients on the nine clinical 
scales, the critical ratio being 4.6. This means 
that most psychiatric patients were capable 
of recognizing and avoiding many of the in- 
dividual deviant responses, even though they 
were still largely unable to produce normal 
profile patterns. For the 33 “improved” cases, 
the critical ratio rose, as one would expect, 
to 6.3. There was also a significant reduction 
in the number of scales at or above the criti- 
cal T score of 70. 

As Table 2 shows, significant improvement 
took place on all scales except the Lie scale 
and the Mf and Ma scales. 


Table 2 


Changes in T Score on the MMPI Scales 
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Fig. 3. An example of a reaction-formation pattern. 





Mean Critical 
Scale change SEw ratio 
L — 04 0.47 0.9* 
F — 56 1.50 3.7 
K + 3.5 0.74 4.7 
K-F + 9.1 2.10 4.3 
Hs —13.1 3.30 3.9 
D —22.6 4.00 5.6 
Hy —11.0 2.40 4.6 
Pd — 14.0 2.36 5.9 
Mf — 5.9 15.10 0.4* 
Pa —148 2.90 5.1 
Pi —16.9 3.06 5.4 
Se —20.9 3.50 6.0 
Ma — 10 1.80 0.6* 





* Not significant. 
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Apparently, the “improved” patients did 
not feel it necessary to present more of a so- 
cially acceptable front on the obvious items 
of the Lie scale, although defensiveness in- 
creased significantly on the more subtle K 
and K-F scales. Furthermore, the patients did 
not seem to consider the Mf and Ma scale 
items as indicative of pathology. All of the 
other scales, however, underwent significant 
change, the greatest mean differences in T 
score taking place on the D, Sc, and Pt scales. 

It seemed reasonable that the degree of 
test improvement might be a useful index of 
potentiality for clinical improvement. It was, 
therefore, ‘hypothesized that the greater the 
improvement ‘shown when the original and 
simulated profiles were compared, the less 
would be the need for prolonged hospitaliza- 
tion. To test this, the number and correspond- 
ing percentage of correct identifications (trial 
visit or discharge, on the one hand, vs. con- 


tinued hospitalization, on the other) were ob- 


tained three months after testing against cri- 
terion reductions in the total T score of more 
than 90, more than 65, and more than 45 on 
the grand T score for the nine clinical scales. 
Table 3 shows the relationship between de- 
gree of improvement when simulating “nor- 
malcy” and status (hospitalized or nonhos- 
pitalized) three months later. 

At each criterion level, the discharged pa- 
tients who equalled or exceeded the criterion 
measure, and the hospitalized patients who 
did not equal the criterion measure, were 
considered correctly identified by the appli- 
cation of that criterion. 

It will be seen from the table that the 
higher the criterion value, the greater the ac- 


Table 3 


Correct Identifications of Discharged vs. Hospitalized 
Status at Three Reduction Levels 
of MMPI Change 








Correctly Identified 








Dis- Hospi- 

charged talized 
Reduction in total T score pts. pts. 
90 or more (= 1 sigma per scale) 47% 81% 
65 or more (=0.7 sigma perscale) 68% 65% 
45 or more (=O.S5sigmaperscale) 74% 58% 





curacy of prediction for patients remaining in 
the hospital. (That is, these patients were 
less capable of producing that much change.) 
On the other hand, the lower the criterion 
value, the greater the accuracy of prediction 
for patients who were out on trial visit or 
discharge status. (That is, more of these pa- 
tients could meet this less stringent criterion.) 

Since a criterion change of 65 or more 
gave the least number of false negative identi- 
fications for either group of patients, a 2 by 
2 chi-square test was performed. This yielded 
a chi-square value of 5.0 which proved sig- 
nificant, for 1 degree of freedom, at the .025 
level of confidence, indicating a significant 
relationship between ability to improve and 
early trial visit or discharge. 

It is of interest to note that neither the 
Total T score on the original performance, 
nor on the simulated performance, taken 
separately, showed any relationship to length 
of hospitalization. In other words, neither the 
initial degree of disturbance, nor the simu- 
lated degree of disturbance, is predictive; but 
only the change in degree of disturbance. 

Apart from predictive value, the double- 
testing approach herein employed seems to 
offer another advantage, namely that of high- 
lighting the subject’s problem areas. Such 
problem areas are not always immediately 
apparent either to the therapist or to the pa- 
tient. The kind of clear and immediate focus 
which this approach seems capable of yielding 
could be of real use to the diagnostician or 
therapist. For example, the patient whose al- 
tered profile declares, in effect: “I would be 
normal if I could accept my passive tendencies 
and feminine interests more comfortably,” or 
“Tf I had fewer doubts about my manhood, I 
could be less anxious,” may be helped early 
in the therapeutic process to view these feel- 
ings and attitudes in a perspective which 
minimizes any seriously disruptive effects they 
might have on treatment. The psychotherapist 
is thus enabled to correct the patient’s gross 
misconceptions early and, by helping the pa- 
tient to make a more realistic appraisal of his 
strengths as well as his limitations, to clear 
the way for further psychotherapeutic en- 
deavors. 

Perhaps a word of caution is in order lest 
the data lead to unduly optimistic interpreta- 
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tions of the changes observed in the “im- 
proved” cases. It might be well to distinguish 
between favorable prognosis for early dis- 
charge from the hospital as against psycho- 
therapeutic accessibility. Many patients ca- 
pable of simulating normalcy on a lip-service 
basis might also be capable of producing su- 
perficial changes in their outward behavior 
which would enable them to be rated much 
improved and ready for release. Some of these 
patients may merely have gained temporary 
control of their erratic and frequently unman- 
ageable impulses without any new basic un- 
derstanding or mastery of these impulses. 
Others, who may recognize how much they 
are at variance with the rest of the world, 
may be unable to effect the basic changes 
which psychotherapy hopes to achieve. 


Summary 


In summary, this study revealed marked 
individual differences in the ability of psychi- 
atric patients to simulate “normalcy” on the 
MMPI. Although most of the patients (73%) 
gave an improved performance, very few 
(11%) became “normal” and some became 
worse. Ability to improve differed for pa- 
tients in different diagnostic categories. Im- 
provement was manifested, in many cases, by 
a reduction in the deviancy of the same diag- 
nostic pattern; in other cases, by a “diag- 
nostic shift” to a less seriously disturbed 
category. Areas of emotional disturbance ap- 
peared to be highlighted in terms of differ- 
ences between the patient’s self concept and 
his ego ideal, as these could be inferred from 
the changes that took place in the profiles. 
Improvability on the test appears to be a 


favorable prognostic indication for early hos- 
pital discharge. Some diagnostic and thera- 
peutic implications of the double-testing ap- 
proach used in this study were briefly 
discussed. 


Received June 5, 1956. 
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Factors Influencing the Prediction of Behavior 
from a Diagnostic Interview’ 


Helene Borke’*® and Donald W. Fiske 
The University of Chicago 


This study explored the effects of different 
conditions upon the accuracy of prediction 
from a diagnostic interview. It differs from 
previous studies in two ways: it utilized di- 
rect interaction as one condition and it used 
responses to a nonverbal preference instru- 
ment as one of the behaviors to be predicted. 


Procedure 


Four judges studied four subjects (Ss) in 
a diagnostic interview situation. The Ss were 
male veterans of World War II, between the 
ages of 25 and 35, who were diagnosed as 
anxiety neurotics by two physicians, one of 
whom was a psychiatrist. The judges were 
third- and fourth-year male VA trainees. 
Each judge came in contact with each of the 
Ss under one of the following conditions: di- 
rect interaction, observation of the interac- 
tion from behind a one-way screen, listening 
to a recording of the interview, and reading 
a verbatim transcript. The behavior to be 
predicted consisted of two Q sorts (8) with 
fixed distributions: the S’s conscious atti- 
tudes about himself were reflected in a ver- 
bal Q sort consisting of 100 items based on 
Murray’s needs (6); his picture preferences 
were elicited by a Q sort of one hundred 


1 This study was conducted at the Hines Veterans 
Administration Hospital. The authors gratefully ac- 
knowledge the assistance of Mr. Frank Brogno, Mr. 
Edward Katz, Mr. Arthur Oriel, and Dr. John Mac- 
Gahan, who served as judges, and Dr. Roy Brener 
and others on the hospital staff. 

2 This paper is based on a dissertation by the first 
author submitted in 1952 to the University of Chi- 
cago in partial fulfillment of the requirements for 
the doctoral degree (1). She is now a psychological 
consultant for the Family Service Association of 
DuPage County, Illinois. 
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picture-postcard reproductions of famous 
paintings selected according to six Rorschach 
determinants—form, color, mood, human 
movement, sex, and aggression. The verbal 
statements and a complete account of the 
selection of the pictures are given elsewhere 
(1). The Ss were told that both the Q sorts 
ind the interview which followed were part 
of the regular hospital diagnostic routine. 
The judges made both sorts for themselves 
and also for a “typical” anxiety neurotic be- 
fore making predictions for the experimental 
Ss. Immediately after his contact with an 
S, each judge was asked to predict how he 
thought the particular S$ had sorted the ver- 
bal and picture materials. Following a latin- 
square design, each judge predicted for each 
of the Ss under a different condition. Each 
prediction sort was then correlated with the 
corresponding patient’s self-sort. It was also 
correlated with the judge’s sort for himself 
and for a “typical” anxiety neurotic.* The 
resulting correlation coefficients were ana- 
lyzed by W, the coefficient of concordance, a 
measure of association among sets of ranks 


(5). 
Results 


Effects of Differences in Environmental Con- 
ditions on Predictions 


The type of contact the judges had with an 
S (interviewing, observing from behind a one- 
way screen, listening to a recording, or read- 


8 The measurement of accuracy of prediction by a 
correlation coefficient between two sorts with fixed 
distributions ignores some components contained in 
a total accuracy score, as unalyzed by Cronbach 
(2). It is obvious that the findings of our study are 
restricted to the accuracy measure used. 








TN a a  — 











Prediction of Behavior from Diagnostic Interview 79 


ing a verbatim transcript) had no significant 
influence on their ability to predict the self- 
descriptions or the picture preferences as 
measured by the Q-sort technique. 


Effects of Judges on Prediction 


The judges’ preconceived ideas about how 
a “typical” anxiety neurotic might be ex- 
pected to behave significantly influenced their 
predictions for particular Ss on both the ver- 
bal and picture sorts. The median correlation 
between the judges’ predictions for a “typi- 
ca]” anxiety neurotic and their predictions for 
the individual Ss was .53 on the verbal sorts 
and .50 on the picture sorts. At the same 
time the findings indicate that for the verbal 
material, actual contact with an S did result 
in improved predictions over the stereotype. 
In 14 of the 16 pairs of correlations, the 
judges’ predictions for particular Ss corre- 
lated higher with each S’s self-sort than did 
their predictions for the stereotype. Since the 
differential predictability of the Ss may be 
associated with these gains, they were evalu- 
ated by determining the probabilities for 
each S separately, and then determining the 
probabilities of the obtained set of four p 
values, via the chi-square transformation (4). 
The resulting » was less than .05. Since the 
judges did not differ in accuracy, it was as- 
sumed in this analysis that for each judge, 
the four differences between stereotype and 
specific prediction were independent. 

On the picture material, the predictions 
for the actual Ss were not significantly better 
than the predictions for the stereotype. 

The data suggest that projection did not 
play a large part in the predictions of the 
judges. For the most part the judges ap- 
peared to be fairly accurate in evaluating the 
extent to which the Ss’ attitudes agreed or 
disagreed with their own, i.e., judges who had 
high correlations between their predictions 
for the S and their sorts for themselves were 
the ones whose own sorts were actually quite 
similar to those of the S, whereas judges 
whose own sorts were very different from 
those of the S had negative correlations be- 
tween their prediction for the S and their 
own sorts. It was also found that similarities 
or differences between a particular judge and 
a particular S, as reflected in the correlation 


between their self-sorts, had no measurable 
effect on the judge’s accuracy in predicting 
either type of material. 


Effects of Behavior Being Predicted 


There were no significant differences in the 
accuracy of judges’ predictions for verbal and 
for picture material. When interpreting this 
finding, however, it should be kept in mind 
that the verbal predictions represented a 
significant improvement over the stereotype 
whereas the picture predictions did not. The 
higher agreemert between the predictions for 
the stereotype and the Ss’ own sorts on the 
picture material may have resulted in part 
from the greater homogeneity of the Ss’ pic- 
ture sorts as compared with their verbal sorts. 
The intercorrelations between Ss on the pic- 
ture sorts ranged from .14 to .39, and on the 
verbal sorts from —.15 to .19. Since the 
stereotype represents a prediction for an av- 
erage, the greater the heterogeneity of the 
group, the poorer such an average becomes as 
a prediction for an individual S. Because of 
this difference in group heterogeneity on the 
two types of material, it is likely that a judge 
would have to improve considerably more 
over his prediction for the stereotype on the 
verbal material than on the pictures in order 


to achieve an accuracy score of equal size on 
both sorts. 


Effects of Subjects on Prediction 


There were significant differences (p= 
.01) between Ss in the accuracy with which 
the judges were able to predict their per- 
formances on both the verbal and picture 
material. One S was consistently predicted 
most poorly on both types of material. 


Discussion 


Perhaps the most unexpected finding of 
this study was that it made no difference in 
the judges’ predictions of attitudes about the 
self or picture preferences whether they in- 
terviewed an S directly, observed him, lis- 
tened to a recording of the interview, or read 
a verbatim transcript. However, these results 
are in close agreement with those of Segel 
(7) and Giedt (3). In an area where dis- 
agreements between experimental findings are 
common (9), such agreement is encouraging. 
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Segel and Giedt found that judges predicted 
almost equally well whether they saw a com- 
plete sound film of an interview, listened to 
it, or read a verbatim transcript. Giedt also 
reported that judges who saw a silent mo- 
tion picture predicted significantly more 
poorly. The failure of auditory and visual 
cues to result in improved predictions, and 
the significantly less accurate predictions 
when visual cues alone were used (3), 
strongly suggest that judges rely primarily on 
the information provided by verbal content 
when predicting conscious attitudes about the 
self. 

An important reason for the almost exclu- 
sive use of content cues by the judges is the 
way the predictive process was defined. In 
most of these studies, the judges were asked 
to predict how the Ss might perform on some 
verbal task, i.e., sentence completion or Q 
sort, where one would expect the content to 
provide the most cues. It is possible that if 
understanding were defined differently as, for 
example, the ability to predict the feelings an 
S might be experiencing at a particular mo- 
ment, other cues might be utilized to a far 
greater extent. 


Summary 


The primary purpose of the investigation 
was to discover which cues contribute most 
to one person’s ability to understand another 
in a diagnostic interview. Following a latin- 
square design, four clinical psychologists 
studied each of four male anxiety neurotics 
under one of the following conditions: di- 
rect interview, seeing and hearing the inter- 
view through a one-way screen, listening to a 
recording of the interview, and reading a ver- 
batim transcript. The psychologists were then 
asked to predict how each of the Ss had made 
a verbal Q sort consisting of self-descriptive 
items and a preference sort with pictures. The 
findings were: 

1. There were no significant differences in 
accuracy of prediction under the various ex- 
perimental conditions of direct interaction, 


observation, listening to a recording, and 
reading. The finding suggested that the cli- 
nicians in this study relied primarily on con- 
tent cues when making their predictions. 

2. The judges were found to be about 
equally skillful in their ability to predict be- 
havior. 

3. Although the judges relied heavily on 
their stereotypes of a “typical” anxiety neu- 
rotic in making their predictions, the patients’ 
verbal sorts were predicted more accurately 
by the judges’ specific predictions for indi- 
vidual patients than by their stereotypes. No 
such increase in accuracy was found for the 
picture sorts. 

Two factors appear to be involved in the 
relative accuracy of a judge’s specific predic- 
tions as compared to his sort for a typical 
patient. One is the judge’s familiarity with 
the predictive task. The other is the simi- 
larity between the behavior to be predicted 
and the behavior on which predictions are 
based. 


Received April 27, 1956. 
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Some Comments on the Measurement of 
Projection and Empathy 


Bernard I. Murstein * 
The University of Texas M. D. Anderson Hospital and Tumor Institute 


The concepts of projection and empathy 
have spurred considerable research in recent 
years with conflicting and often confusing re- 
sults. Much of the difficulty stems from the 
methodological approach to the measurement 
of these variables. In a recent article (2), 
Nerman and Leiding have examined the re- 
lationship between the variables projection, 
empathy, and refined empathy. These vari- 
ables were examined with regard to both in- 
dividual and mass empathy tests. The Dy- 
mond individual empathy test measures the 
ability of a person to predict the self-rating 
of another person. The mass empathy test re- 
fers to a given individual’s ability to predict 
the self-responses of a group of persons. The 
authors concluded that there was no signifi- 
cant relationship between mass and individual 
empathy tests, although within each test, 
significant correlations existed between the 
aforementioned variables. 

It is the purpose of this paper to examine 
the various significant correlations found by 
Norman and Leiding, and demonstrate how 
these correlations are at least in part, spuri- 
ous, and thereby, weaken one’s confidence in 
the findings reported by these authors. 

In demonstrating this spuriousness, it is 
convenient to describe the steps utilized by 
Norman and Leiding in measuring each vari- 
able. Hence, an alphabetical shorthand (a, 3, 
c, d, ...) will be used in describing each 
new step in order to make the comparison of 
thé various steps less cumbersome. 


1. Dymond Individual Empathy Test 
la. Raw Empathy 


(a) A rates B as he thinks B would rate 
himself. (6) A rates himself (A) as he thinks 


1 Now at Louisiana State University. 
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B would rate him. (c) B rates himself (B). 
(d) B rates A as he (B) sees him. 

A measure of A’s empathic ability is ob- 
tained by calculating the discrepancy of A’s 
predictions for B (steps a and }) from B’s 
actual ratings (steps c and d). Thus Raw 
Empathy is derived from the equation Raw 
Empathy = 


(a—c)+(b-—d) 


where the total score is the sum of the dis- 
crepancies without respect to sign. 


1b. Projection 


(a) A rates B as he thinks B would rate 
himself. (6) A rates himself (A) as he thinks 
B would rate him. (e) A rates B as he (A) 
sees B. (f) A rates himself (A). Projection = 


(a—e) + (f— 5) 
Ic. Refined Empathy 


Refined Empathy = Raw Empathy — Pro- 
jection = 


[(a—c) + (6—d)] —[(a—e) + (f—5)] 


2. Norman and Leiding’s Mass Empathy 
Test 
2a. Raw Empathy 


(g) In the total group tested, 51 per cent 
or more of a group answered one of the ques- 
tions of a personality inventory in a certain 
way (e.g., “Yes’’). 

(A) On an alternate form, A answered in 
the same direction when he was required to 
judge about most other people. A Raw Em- 
pathy point is counted each time g and &é oc- 
cur simultaneously. Raw Empathy = 


=(g, A) 
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2b. Projection 


(4) A says most other people will answer 
in a certain way. 

(#) A answers in the same way in judging 
himself (A) as he predicted that most other 
people would answer a given question. Pro- 
jection occurs each time A and # occur to- 
gether. Projection = 


3(h, i) 


2c. Refined Empathy 
Refined Empathy = Raw Empathy — Pro- 
jection = 


The operations involved in correlating Raw 
and Refined Empathy (r= .81) for Dy- 
mond’s test are 


Raw Empathy 
(a—c)+(b-—d) 
vs. 
Refined Empathy 
(Raw Empathy Projection) 
[(a—c) + (6—d)] —[(a—e) + (f—5)] 


It may readily be seen from the above no- 
tation, that these two measures of empathy 
might be expected to show a spurious posi- 
tive relationship due to the common com- 
ponents occurring when one correlates a whole 
with one of its parts. 

An examination of the operations involved 
in correlating Refined Empathy and Projec- 
tion (ry = — .47) yields the following nota- 
tion: 


Refined Empathy 
[(a—c) + (6—d)] — [(a—e) + (f—4)] 
vs. 
Projection 


[(a@—e) + (f—5)] 


In view of the position of the common 
components above, it may be seen that as the 
projection score increases, the common com- 
ponent in the refined empathy score, which 
must be subtracted, becomes increasingly 
larger. Hence, a negative relationship would 
be expected between Refined Empathy and 








Murstein 


Projection even if no actual psychological re- 
lationship existed. 

The operations which result in an r of .86 
between the Raw Empathy and Projection 
variables within the Mass Empathy Test of 
Norman and Leiding are as follows: 


Projection 
(A, i) 


Raw Empathy 
=(g, 4) 


Since operation 4 was common to both vari- 
ables, the correlation resulting, contained a 
degree of positive spuriousness. 

The last significant correlation is a nega- 
tive one (r = — .69) between Refined Em- 
pathy and Projection. The operations are 


Refined Empathy 
Raw Empathy — Projection 


Again, one may note that as the Projection 
score increases, the Refined Empathy score 
will drop solely on the basis of the method of 
measurement. Thus, a spurious negative rela- 
tionship is to be expected. 

The study of such concepts as projection 
and empathy should be of utility in under- 
standing human behavior and in predicting 
behavior. Unfortunately, complicated meth- 
odologies involving the manipulation of sun- 
dry discrepancy scores are not only difficult 
to justify on logical grounds, but often con- 
tain spurious components which make the 
end scores invalid. It is possible, of course, to 
partial out these spurious elements as Calvin 
and Holtzman (1) have done. It seems more 
parsimonious, however, to construct a meas- 
uring instrument which operationally defines - 
the concept to be examined without being 
statistically inappropriate. 


Vs. 


Projection 


vs. (4, i) 
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Self Concept and Defensive Behavior 
in the Maladjusted 


Joseph S. Hillson 
Norfolk State Hospital 


and Philip Worchel 


University of Texas 


Considerable light has been thrown on the 
nature of the personality dynamics in the 
maladjusted by the many recent investiga- 
tions of the self concept. In order to evaluate 
the implications of the results of these stud- 
ies on self theory, it is necessary to distin- 
guish two phases in the development of mal- 
adjustment: the arousal of anxiety and the 
development of defensive behavior. Self the- 
ory (11) contends that tensions arise when 
the organism strives to satisfy needs not 
consciously admitted and to respond to ex- 
periences denied by the conscious self. Anx- 
iety is felt when the individual is aware of 
this tension or discrepancy. Sullivan (14), in 
similar fashion, states that anxiety appears 
when anything spectacular happens that is 
not welcome to the self. Defensive behavior 
develops in order to maintain the structure 
of the self, and as Rogers (11) suggests, the 
more perceptions of experiences inconsistent 
with the concept of self there are, the more 
rigid is the organization of the self-structure. 
When the self cannot defend itself any longer 
against deep threats, a psychological break- 
down or disintegration occurs. Hogan (10) 
describes eight steps in the pattern of threat 
and defense. Anxiety is reduced by denial or 
distortion of perceived experience. 

In general, the investigations thus far have 
dealt with the nature of the self-structure in 
‘subjects who are anxious and aware of their 
maladjustment. Maladjustment in these stud- 
ies has been defined by extreme scores on 
some measure of maladjustment (2, 3, 6, 9) 
or by voluntary requests for assistance in the 


solution of personal problems (4, 8, 13). If, 
as suggested by Hogan (10), defensive be- 
havior reduces the awareness of threat, then 
maladjusted subjects who have developed de- 
fense patterns should no longer admit incon- 
gruity between perceived experience and their 
self concepts. Thus it is hypothesized from 
self theory, that 


(a) schizophrenic subjects characterized by 
defensive patterns deny any incongruity be- 
tween the self and ideal concepts, and 

(6) neurotic subjects with anxiety reac- 
tions report an inconsistency between their 
self and ideal concepts. 


Further hypotheses concerning the nature 
of the self concept in the maladjusted are de- 
rived from Adlerian theory on the dynamics 
of the neuroses. The neurotic, according to 
Adler (1), sets up fictitiously high goals be- 
cause of intense feelings of inferiority and 
abnormal need for power. These goals, being 
unrealistic, are unobtainable, and failure to 
achieve them results in increased feelings of 
anxiety and inferiority. Therefore, it is pre- 
dicted that 


(c) neurotic subjects depreciate themselves, 

(d) schizophrenics, who have developed de- 
fense patterns, rate themselves at least as 
well as normal subjects, and 

(e) the ideal concept is higher for the neu- 
rotic than for the normal or adjusted person. 


It is proposed to test the above hypotheses in 
the present study. 


Joseph S. Hilison and Philip Worchel 





Table 1 


Characteristics of the Subjects in Each Group 
































Sex Age Educ. (Grade) 

Group N Male Female Mean SD Mean SD 
Normals 47 24 23 25.33 5.00 13.51 1.38 
Neurotics 37 19 18 30.26 10.30 12.25 3.90 
Schizophrenic 36 14 22 30.46 8.06 11.95 2.18 
Method in the present study. The Inventory is com- 


Three groups of subjects were selected for 
the present investigation. The normal group 
consisted of 47 students who were not cur- 
rently under treatment for emotional disturb- 
ances and who had never been under such 
treatment. They were largely drawn from col- 
lege sophomores and freshman nurses. The 
group representing subjects with some overt 
or reported anxiety about their condition con- 
sisted of 37 neurotic subjects currently under 
treatment for an emotional disturbance either 
on an inpatient or outpatient basis. Persons 
whose diagnosis included some form of char- 
acter disorder were not included in this group. 
They were selected from two University Clin- 
ics and three hospitals. The third group con- 
sisted of 36 schizophrenic patients, either 
mixed or paranoid type, none of whom had 
been hospitalized for more than six weeks. In 
general, this group was characterized by so- 
matic delusions and delusions of persecution. 

Table 1 gives the means and standard de- 
viations for age and education for the three 
groups as well as the number of males and 
females in each group. It will be noted that 
the groups, in general, are fairly well equated 
on all these variables. In addition, we tried 
as Closely as possible to secure subjects from 
the same socioeconomic class. 


Instrument and Procedure 


The Self-Activity Inventory (SAI) devel- 
oped by Worchel for the USAF was employed 


1The cooperation of Dr. H. T. Manuel of the 
Testing and Guidance Bureau, Dr. Paul L. White of 
the Health Center, University of Texas, Dr. S. Gold- 
stone, Baylor Medical College, and Dr. A. Foster, 
Galveston State Psychopathic Hospital, in the se- 
lection of subjects is gratefully acknowledged. 


posed of 54 statements describing responses 
to the arousal of hostility, achievement, 
sexual, and dependency needs. Almost all the 
responses selected, however, are considered 
ineffectual since they are likely to precipitate 
conflict with social requirements, or not re- 
duce the conflicts involved. To measure the 
intensity of the responses, the S is asked to 
indicate on a 5-point scale, from 1 indicat- 
ing mever to 5 indicating always, how much 
of the time the activity described is like him 
(Self), how he would like to be (Ideal), and 
how it is like other people (Other). Thus, 
the higher the sum of the scores on any of 
the three columns, the more frequent are the 
ineffectual responses employed. A low score 
represents the positive self-attitude or the ad- 
justed end of the continuum while a high 
score represents the negative or maladjusted 
extreme. Some of the items on the Inventory 
are: 


1. Feels he must win in an argument; 

2. Plays up to others in order to advance his po- 
sition ; 

3. Refuses to do things because he is not good at 
them ; 

4. Tries hard to impress people with his ability. 


The total discrepancy score for each sub- 
ject between self and ideal was obtained by 
summing the absolute discrepancy scores for 
each item while the discrepancy scores be- 
tween self and other concepts was the alge- 
braic difference between the total scores on 
each of the two concepts. The algebraic dif- 


2 This study was supported in part by funds pro- 
vided under Contract AF 18(600)-913 with the 
USAF School of Aviation Medicine, Randolph Field. 
Correspondence concerning the use of the SAI should 
be addressed to Dr. Saul B. Sells, Department of 
Clinical Psychology, Randolph Field, Texas. 














ference between self and other was calculated 
in order to show the direction of the differ- 
ence between self and other person. 

Thus the SAI provides one measure of the 
integrity of the self system, namely, the ab- 
solute difference between self and ideal con- 
cepts (Column I minus II) and two meas- 
ures of depreciation, the magnitude of the 
rating on the self (Column I) and the mag- 
nitude and direction of the difference between 
self and other person (Column I minus III). 


Results 
Self-Consistency 


Significant differences in the discrepancy 
between self and ideal were predicted among 
the three groups of Ss. In order to perform 
the analysis of discrepancy scores between 
Self and Ideal (S-I), it was necessary to cor- 
rect the discrepancy scores to take into ac- 
count the fact that the size of the discrep- 
ancy is partly a function of the Self scores. 
To correct the obtained discrepancy scores, 
a scattergram was prepared for (S) against 
(S-I). The scattergram indicated clearly that 
the regression line for predicting discrepancy 
scores from (S) was linear and that the ar- 
rays were relatively homoscedastic. The prod- 
uct-moment correlation coefficient between 
(S) and (S-I) for all groups combined was 
0.74. The correlation was sufficiently high to 
warrant adjustment of the obtained discrep- 
ancy scores. Each discrepancy score was cor- 
rected by subtracting the predicted discrep- 
ancy score from the obtained discrepancy 


Table 2 


Means and SDs of the Concept Scores and of the 
Corrected Discrepancy Scores for all 
Groups of Subjects 




















Group 

Schizo- 

Normal Neurotic phrenic 
Measure M SD M SD M SD 
Self 132.7 15.7 151.8 30.8 139.1 24.2 
Ideal 98.8 15.3 103.6 23.4 108.0 19.5 
Other 157.3 17.2 146.4 24.9 152.0 24.2 
(SD. —3.2 16.6 5.8 21.9 —6.2 20.7 
(S-O). —10.2 16.6 7.9 25.5 3.5 24.5 
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Table 3 


Tests of the Significance of the Difference Between the 
Means of the Concept and Discrepancy 
Scores on the SAI 








Normal Normal Neurotic 
vs. vs. Schizo- vs. Schizo- 
Measure Neurotic phrenic phrenic 
Self 3.43** 1.38 2.00* 
Ideal 1.08 2.34** 0.87 
Other 2.26** 1.19 1.04 
(S-I). 2.07* 0.76 2.54** 
(S-O), 3.75 3.16** 0.80 





* Significant at the .02 level. 
** Significant at the .01 level. 
Note.—One-tailed tests of significance presented. 


score. The data for the corrected scores, 
(S-I)., together with the data on self, ideal, 
and (S-O). are summarized in Table 2. Our 
hypotheses predicted that the discrepancy 
score for the neurotic would be greater than 
that of the normals and paranoid schizo- 
phrenics, and the discrepancy score for the 
schizophrenic would be at least equal to that 
of the normals. 

Table 3 presents the ¢ ratios of the differ- 
ences between the means of the (S-I),, self, 
ideal, and (S-O), scores. The difference be- 
tween normals and neurotics on the (S-I), 
scores is 9.02 which, for a single-tailed test, 
is significant at the .02 level. This differ- 
ence indicates that the neurotics perceive a 
greater discrepancy existing between their 
Self and Ideal than do the normals when the 
effect of the self-rating is partialled out, as 
was predicted from theory. 

The only other significant difference at the 
.01 level is that between the psychotic and 
neurotic. As was predicted, the direction of 
the difference is such as to indicate that the 
psychotics perceive a smaller discrepancy be- 
tween Self and Ideal than do the neurotics. 


Self-Depreciation 


It was predicted that our neurotic subjects 
would evaluate themselves more unfavorably 
than the normals while the schizophrenics 
would rate themselves at least as favorably 
as the normal subjects. On the SAI, there- 
fore, the neurotics should have a significantly 
higher score (low self-evaluation) on Column 
I (Self) and a higher positive discrepancy 
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between Columns I and III (Self-Other) 
than normals, and the schizophrenics should 
have a score on Self (1) and discrepancy be- 
tween self and other, at least equal to that 
of normals. 

Table 2 contains the means and standard 
deviations for the Self and the corrected dis- 
crepancy scores between Self and Other, for 
each of the three groups. Inspection of the 
raw data indicated that the distributions ap- 
proached a normal curve sufficiently so that 
é tests of differences between the means could 
be computed. Since the groups were unequal 
and the variances heterogeneous on the Self 
scores, a formula for ¢ was used which takes 
into consideration the variances of both sam- 
ples as estimates of the population variance. 

Table 3 shows the results of the tests of 
significance of the differences between the 
means of the Self and Self-Other scores for 
each group compared with the normals and 
with each other. The difference between the 
mean Self score of the normals and neurotics 
is 19.08 which is significant beyond the .01 
level (¢ ratio of 3.43). Compared to the nor- 
mals, therefore, the neurotic rates himself 
more negatively, which confirms the predic- 
tion on the depreciated self-picture of the 
neurotic. 

The difference between the means of the 
normals and schizophrenics on the Self scores 
is 6.40 (¢ of 1.38) which, for a one-tailed 
test, is not significant at the .05 level. Thus 
the schizophrenic perceives himself as posi- 
tively as the normal person does, which is 
what was predicted. 

The difference between the means of the 
neurotics and schizophrenics, as we would ex- 
pect, is 12.70 (¢ of 2.00) which, for a one- 
tailed test, is significant beyond the .02 level. 
The direction of the difference indicates that 
neurotics see themselves more negatively than 
do schizophrenics. 

From Table 3 it is clear that both malad- 
justed groups differ significantly from the 
normal group regarding the discrepancy be- 
tween their self concept and their concept of 
others. The differences between the means of 
the normals and neurotics and between the 
normals and schizophrenics are significant be- 
yond the .01 level. The direction of the dif- 
ferences in both instances was the same, in- 


dicating that the two maladjusted groups 
have a greater discrepancy between their 
self concepts and their concept of others than 
do nermals when the effect of their own self- 
rating is partialled out. This finding is in line 
with our hypothesis of depreciation in neu- 
rotics but opposite to what we predicted for 
the schizophrenics. If anything, the defensive 
pattern of projection in the schizophrenic 
would lead us to expect the smallest discrep- 
ancy between the self and other in the psy- 
chotic. 


The Ideal 


It was predicted from Adlerian theory that 
the neurotic would have a higher “ideal” 
(lower score on Column II) than that of the 
normals. No predictions could be made for 
the schizophrenic. The results show that 
there was no significant difference between 
the mean “ideal” scores of the neurotics and 
normals (Table 3). The difference between 
the normals and schizophrenics of 9.20, how- 
ever, is significant at the .01 level (¢ of 
2.34). The direction of the difference is such 
as to indicate that the schizophrenics have a 
lower ideal relative to that of normals. 


Discussion 
The Neurotic 


Taken as a group, the neurotic patient sees 
himself as employing behavioral patterns that 
are much more frequently ineffectual in meet- 
ing his needs than the normal person. His de- 
sired goals are no higher than those of the 
normal, but he sees other people as being 
more effective in meeting their needs than the 
normal sees them. When we rule out differ- 
ences in self-perception, between him and the 
subjects in the other two groups, he has a 
greater self-ideal discrepancy than the others, 
that is, he secures a self-ideal discrepancy 
significantly greater than that predicted by 
his self score. In addition, he is more self- 
depreciative. Calvin and Holtzman (5) also 
found that the tendency to enhance the self 
is inversely related to maladjustment in a 
group of college subjects. Thus we have a 
person who not only does not measure up to 
his ideals but sees himself as inferior to the 
average person. As a matter of fact, his group 
is the only one of the three groups whose 














mean self-score was larger than the “other” 
score. He is, in reality, a “miserable” person. 
These findings tend to confirm Adler’s view 
(1) of the neurotic with one modification. 
Adler pictures the neurotic as developing in- 
tense inferiority feelings with an overdevelop- 
ment of the need for power. These factors 
plus the underdevelopment of “community 
feeling” lead to the setting up of fictitiously 
high goals. The “ideal,” however, is ficti- 
tiously high not relative to others but only 
when compared to the evaluation of the self. 
The goals which are set are “fictitious” in re- 
lationship to what is perceived as accom- 
plished. Self-ideal discrepancy and self-other 
depreciation go hand in hand in the neurotic. 


The Schizophrenic 


As was predicted, there was no difference 
between the normals and schizophrenics on 
self-consistency. On self-depreciation, how- 
ever, one of the measures on the SAI (Self) 
confirms the hypothesis that there is no dif- 
ference between the normals and _ schizo- 
phrenics. On the rating of Self, the schizo- 
phrenics rate themselves as being as effective 
in their need-satisfaction patterns as the nor- 
mals but more effective than the neurotics. 
By implication, we would have expected the 
schizophrenics to possess the least effective 
adjustmental patterns. Thus, as was sug- 
gested in the hypotheses, his self-ratings are 
probably a result of defensive distortion. 

On the other measure of self-depreciation, 
the corrected discrepancy score between self 
and other person, the results were contrary 
to our prediction. The schizophrenics rated 
themselves more depreciatively than the nor- 
mals. This depreciation was due to the fact 
that the patients tended to enhance other 
people (Column III) relative to self more 
than do the normals. 

On the ideal, however, the psychotic sub- 
ject has a significantly lower aspiration level 
than the normal person. He prefers to behave 
less effectively. This lowering of the ideal self 
could be an extension of the defensive distor- 
tion of the self concept. This defensive com- 
bination, distortion in  self-appraisal and 
lowering of the ideal self, enables the schizo- 
phrenic to enhance himself relative to his 
ideal and thus avoid the anxiety arising from 
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a discrepancy in the self. The obtained dis- 
crepancy score, as we would expect, there- 
fore, is smaller than that predicted by his 
self-rating. Thus one can readily understand 
the lack of anxiety and the absence of motiva- 
tion for therapy in the schizophrenic patient. 


Summary 


On the basis of self theory and Adlerian 
theory, the following hypotheses concerning 
the nature of the self-system in maladjusted 
subjects presenting anxiety and defensive pat- 
terns were proposed for the present investi- 
gation: (a) Maladjusted subjects character- 
ized by anxiety would present a depreciated 
self picture, report high ideals, and show a 
high discrepancy between self and ideal con- 
cepts, (6) Maladjusted subjects with defen- 
sive patterns would show little discrepancy 
between self and ideal and would present a 
picture of the self similar to that of normals. 

A Self-rating inventory (SAI) consisting of 
54 statements of need-satisfaction patterns 
was administered to 47 normals, 37 neurotics 
(the “anxious” group), and 36 schizophrenics 
(group with defensive patterns). The inven- 
tory yielded Self, Ideal, and Other scores for 
each subject. In addition, corrected discrep- 
ancy scores between Self and Ideal, and be- 
tween Self and Others were computed. The 
results show that: 


1. The neurotic group gave significantly 
poorer self-appraisals than the other two 
groups. The normals and schizophrenics gave 
practically similar self-appraisals. 

2. On the ideal, the neurotic was not sig- 
nificantly different from the normals, but the 
schizophrenics set their level significantly 
lower than that of the normals. 

3. When the effect of the self-rating is par- 
tialled out, the self-ideal discrepancy for the 
neurotics is significantly greater than that for 
the normals and schizophrenics. There was no 
difference between schizophrenics and nor- 
mals. 

4. On the corrected self-other discrepancy, 
the normals differed significantly from the 
two maladjusted groups. Whereas they tended 
to enhance themselves, relatively speaking, 
the maladjusted groups tended to depreciate 
themselves when the effect of the self-rat- 
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ing was partialled out. There were no differ- 
ences in this regard among the maladjusted 
groups. 


Received May 1, 1956. 
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Another Application of the Spiral Aftereffect 
in the Determination of Brain Damage 
H. A. Page, G. Rakita, 


University of Wisconsin 


H. K. Kaplan, and N. B. Smith 
Mendota State Hospital 


This study represents another attempt to 
differentiate between patients diagnosed as 
organic from those without such pathology in 
evidence. Price and Deabler (3) have re- 
ported considerable success with the use of 
an Archimedes Spiral in spotting patients 
with organic brain involvement. On the basis 
of earlier studies (1, 4) they hypothesized 
that organic Ss would be relatively incapable 
of perceiving a negative figural aftereffect. 
The S is exposed to a rotating spiral which 
is then stopped. The negative figural after- 
effect is evidenced in the impression that the 
motionless spiral is turning in the opposite 
direction, changing in size, or moving back- 
ward or forward in space. Their results dra- 
matically support this hypothesis in that none 
of the nonorganic normal or psychiatric con- 
trol Ss failed to give some evidence of a nega- 
tive figural afteraffect while 60% of the or- 
ganic Ss indicated a complete absence of the 
effect. In addition, over 90% of the control 
Ss reported a “full” effect while only 2% of 
the organic group received a comparable rat- 
ing on the basis of their verbal reports. 

The present study was undertaken in the 
interest of providing an additional test of the 
diagnostic power of the spiral aftereffect and 
to do so within the context of a carefully 
matched control group. The authors were 
concerned with the possibility that differences 
between organic and control groups in such 
factors as age, intelligence, and chronicity or 
length of hospitalization might have operated 
to enhance the differences noted in the ear- 
lier investigation. 


Method 
Apparatus 


The apparatus was essentially similar to 
that described by Price and Deabler. Differ- 
ences involved the use of an 8-inch disc rather 
than a spiral of 6-inch diameter. The disc 
was driven by a spring-powered phonograph 
motor which permitted a speed of 100 r.p.m. 
Such an arrangement seemed preferable in 
terms of its light weight and the independ- 
ence of electric outlets. The motor was 
mounted in a gray wooden cabinet 17 inches 
high and 19 inches wide. 


Procedure 


Initial instructions to Ss and their place- 
ment with respect to the apparatus were 
identical to those described by Price and 
Deabler. However, one rather than two spi- 
rals were employed and Ss were administered 
three rather than four spirals. The spiral pro- 
vided an illusion of expansion while in mo- 
tion with a negative figural aftereffect of 
backward movement or shrinking when rota- 
tion ceased. On each trial the spiral was set 
in motion for 30 seconds. The S was asked 
what the line appeared to be doing. The disc 
was then stopped and any verbalization by S 
to the effect that the disc was moving back- 
ward in space, shrinking, or revolving in re- 
verse was accepted as evidence of a negative 
figural aftereffect. If S gave such an indica- 
tion, he was requested to indicate when the 
aftereffect was no longer apparent. The time 
was recorded. 
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Table 1 


Comparison of Organic and Control Groups on Matching Variables 























Organic Control 
Variable Mean Sigma Range Mean Sigma Range 
Age 47.25 14.56 25-72 46.85 14.51 21-68 
Education 
(Years) 10.63 3.79 6-16 8.75 2.41 3-12 
Length of 
Hospitalization 3.83 4.92 .08-15 3.90 4.68 08-15 





Subjects 


All Ss were patients at the Mendota State 
Hospital, Madison, Wisconsin. The organic 
group consisted of 20 patients (10 male, 10 
female). Considering the primary diagnosis, 
there were seven cases of cerebral arterio- 
sclerosis, five prefrontal lobotomies, three 
convulsive disorders, two Korsakoff’s syn- 
drome, one traumatic brain damage, and two 
cerebral vascular accident. All of the patients 
were considered to be suffering from cortical 
damage. 

The control group consisted of 20 pa- 
tients (10 male, 10 female), all of whom 
were free of any pathology suggestive of or- 
ganic brain involvement. Although 4 of these 
patients had received electroshock or insulin 
coma treatment at some time, none had been 
subjected to such therapy within a five-month 
period prior to the study. Twelve patients 
were diagnosed as schizophrenic, three as de- 
pressed, one as alcoholic, two as neurotic, and 
two as paranoid states. Table 1 presents a 
comparison of the organic and control groups 
on the variables of age, education, and length 
of hospitalization. The groups did not differ 


Table 2 


Frequency of Reported Figural Aftereffect in 
Organic and Control Groups 











Trial Organic Control Probability 
Trial 1 6 15 <.02 
Trial 2 5 17 <.01 
Trial 3 5 16 <0! 
Total 8 17 <.01 
(Aftereffect reported 


on at least one trial) 








significantly in either mean values or vari- 
ance for any of the three variables. 


Results 


Analysis was made of the difference in the 
proportion of Ss in the organic and control 
groups for each of the three trials as well as 
the total trials. This analysis is presented in 
Table 2. It will be noted that significant chi 
squares were obtained in comparing the inci- 
dence of the aftereffect across the two groups 
for all three trials and total trials. 

A comparison was made between the two 
groups in terms of the magnitude of the ef- 
fect as determined by Ss’ verbal reports sug- 
gestive of its duration. A Mann-Whitney U 
test failed to reject the null hypothesis indi- 
cating that differentiation between the groups 
could not be accomplished by considering the 
reported length of the aftereffect. In addition 
rank correlations, tau, were run for the con- 
trol Ss between the magnitude of the effect, 
on the one hand, and age, educational level, 
and length of hospitalization on the other. 
None of these correlations was demonstrated 
to be statistically reliable. 

It may be of interest to note that of the 
five cases of prefrontal lobotomy included in 
the organic group, three reported the after- 
effect. This incidence of aftereffect places 
these patients midway between the control 
group and the remaining brain-damaged pa- 
tients. 

Discussion 


The principal findings of this research sup- 
port those reported by Price and Deabler. 
However, these results fail to attribute to the 
spiral aftereffect a discriminatory ability of 











the power suggested by their findings. It is 
suggested that the replication of differentia- 
tion between organic and control groups adds 
more weight to the theoretical importance of 
the measure. In fact, the incidence of no ef- 
fect in the organic group is exactly the same 
as noted in the previous study. The practical 
significance of the spiral aftereffect though is 
diminished somewhat on the basis of the cur- 
rent findings. In the clinical setting, the diag- 
nosis of organicity must typically be made 
among individuals who resemble one another 
in such characteristics as age, socioeconomic 
background, cooperativeness, and so forth. If 
one considers the control group employed in 
this research to be representative of the kind 
of patients typically involved in such dis- 
criminations, it is apparent that the overlap 
between the groups would frequently result 
in false negatives and less frequently in 
false positives. 

Considering the test on an all-or-none ba- 
sis, that is, the effect is reported or is not re- 
ported, our data would suggest that 40% of 
the organic group would not be so identified. 
Conversely, some 15% of the nonorganic pa- 
tients would be inaccurately described as 
suffering cranial damage. 

It is conceivable that the measurement of 
the spiral aftereffect could be improved con- 
siderably if it were not dependent upon a 
subjective verbal report. The authors are cur- 
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rently attempting to develop a procedure 
which more nearly parallels the nonverbal 
procedures utilized in the determination of 
the kinesthetic figural aftereffect. 


Summary 


Significant differences were obtained be- 
tween a group of patients suffering cortical 
brain damage and a matched group of pa- 
tients with a functional diagnosis in the per- 
ception of a negative spiral figural after- 
effect. Organic patients were less likely to re- 
port the effect. The results were interpreted 
as providing additional support to the theo- 
retical implications of this measurement, but 
as providing less evidence that the aftereffect 
may serve as an effective diagnostic device. 


Received April 24, 1956. 
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New Tests 


Brainard, Paul P., & Brainard, Ralph T. Brainard 
Occupational Preference Inventory. High school- 
adult. 1 form. Untimed, (30) min. Booklet ($5.50 
per 25), with key, and manual, pp. 12; IBM an- 
swer sheet ($1.60 per 25); specimen set (60¢). 
New York: Psychological Corp., 1945, 1956. 

The Bainard inventory, which has been under in- 
termittent development since 1922, received some re- 
vision and completely new norms in 1956. It sam- 
ples six fields of occupational interest for each 
examinee: commercial, mechanical, professional, es- 
thetic, and scientific (for both sexes), agricultural 
(for boys only), and personal service (for girls 
only). Each field is tapped by 20 items which are 
rated on five-point scales of liking. The reading diffi- 
culty of the items has been kept at a low level. The 
reliabilities of the individual scales, for grades 10 and 
12, range from .71 to .88 by retest, and from,.82 to 
95 by odd and even items. Intercorrelations of the 
scores are mainly low, indicating a fair degree of in- 
dependence. Validity, aside from a few correlations 
with the Kuder, is defended mainly from content; 
no predictive studies or scores of adults in occupa- 
tions are cited. The impressive new norms, obtained 
in 1955 and 1956, are based on 4,855 boys and 4,840 
girls in grades 8 to 12. In spite of a few obvious 
shortcomings, the Brainard will probably remain a 
serviceable tool in the hands of skillful high school 
counselors. The present versjon is a considerable im- 
provement over earlier ones—L. F. S. 


Cooperative School and College 
(SCAT). Examiner’s manual: First Supplement/ 
1956, pp. 11; Second Supplement/1956, pp. 6. 
Princeton, N. J., & Los Angeles, Calif.: Educa- 
tional Testing Service, 1956. 


Fulfilling the promises made when the tests were 
first issued, the supplements provide additional nor- 
mative data on the SCAT (see J. consult. Psychol., 
1956, 20, 160). The First Supplement gives fall norms 
for grades 10, 11, and 12, data relevant to fall norms 
for college freshmen, and correlations between the 
SCAT and the ACE. The Second Supplement deals 
with the prediction of scores on the College En- 
trance Examination Board’s SAT from the SCAT.— 
L. FP. S. 


Ability Tests 


PSYCHOLOGICAL 








Flanagan, John C. Flanagan Aptitude Classification 
Tests (FACT). Supplementary manual: Interpret- 
ing Test Scores, pp. 12. Chicago: Science Research 
Associates, 1956. 

When they were first published in 1953, it was 
quite evident that FACT needed occupational norms 
based on the scores of persons tested as students who 
had later entered various fields of work (see J. con- 
sult. Psychol., 1954, 18, 231-232). The present book- 
let presents data which begin to fulfil this require- 
ment. Pittsburgh students tested in the standardiza- 
tion of the test were followed into 23 occupations 
requiring little preparation, and into training courses 
for 19 occupations of higher level. Data on those suc- 
cessful and satisfied in work or training provide 
tentative occupational standards. For example, an 
occupational stanine of 7 seems needed for the study 
of engineering, but one of 3 often suffices for work 
as a salesperson. The present standards, while use- 
ful, are properly labeled as tentative——L. F. S. 


Gorham, Donald R. Proverbs Test. Grade 5-adult. 
3 forms, clinical form; 1 form, best answer form. 
Untimed, (20-40) min. Test blank, clinical forms 
I, Il, III (10 ea.); scoring card for each clinical 
form; test booklet, best answer form (1); answer 
sheet (25); scoring stencils; general manual, pp 
12; clinical manual, pp. 17; complete kit ($4.50 
for quantities indicated; replacements available). 
Grand Forks, N. D.: Psychological Test Special- 
ists, 1956. 

Proverbs tests, and their close relatives the fables, 
have been used by psychology and psychiatry for at 
least 50 years. Gorham’s versions have received a 
reasonable degree of validation and standardization. 
Although by no means without faults when judged 
by the most rigorous technical standards for test 
construction, they seem significantly better than the 
off-hand versions frequently used. In the “clinical” 
forms of the Proverbs Test, the examinee writes what 
each of 12 proverbs means. Three well-equated 
“clinical” forms permit lengthening the test, or re- 
testing without repetition of content. Explicit, printed 
scoring scales permit an interscorer reliability of .95; 
estimates of subject reliability are .79 for one form, 
88 for two, and .92 for all three. The 40-item “best 
answer” version is a multiple-choice test with a cor- 
rected split-half reliability of .88. The clinical and 














Psychological 


best answer forms correlate from .81 to .90. The 
tentative norms, which need improvement, are based 
on modest numbers of pupils from grades 5 to 12 
and college students, all from Texas, and substantial 
numbers of Air Force enlistees. What does the test 
measure? Although a logical case can be made for 
“abstract thinking,” correlations with other tests seem 
to reveal mostly that ubiquitous factor, verbal in- 
telligence (r with vocabulary, .80). Clinical studies 
of patients, however, show that the test may tap 
schizophrenic disturbances of thought processes, espe- 
cially if interpreted in relation to vocabulary. The 
special “abstract” and “concrete” scores obtainable 
from the test are of interest in relation both to men- 
tal development and to psychopathology —L. F. S. 


Harrison, M. Lucille, & Stroud, James B. Harrison- 

Stroud Reading Readiness Profiles. Grades kgn—1. 

1 form. 5 subtests administerable to smal! groups, 

1 subtest individual; (80) min., in 4 sessions. Test 

booklets, pp. 12 ($3.45 per 35) with class record 

sheet, and teachers manual, pp. 23; specimen set 

(60¢). Boston: Houghton Mifflin, 1956. 

The Reading Readiness Profiles are group tests of 
the abilities and skills which children use in learning 
to read. The subtests require using symbols, making 
visual and auditory discriminations, using contexts, 
and knowing the letters of the alphabet. The mate- 
rials seem attractive and practicable for use with 
five- and six-year-olds. The six parts are interpreted 
as a profile, not combined to produce a single score. 
Interpretations are made in terms of five degrees of 
reading readiness, and are related to the instructional 
needs of individual pupils. Percentile norms for the 
subtests are adequately based on a national sample 
of over 1,400 pupils. The profile method of inter- 
pretation raises the common hazard of applying 
scores obtained from brief subtests. For example, the 
20th, 40th, and 60th percentiles are used as impor- 
tant cutoff points, and on Test 1 these percentiles 
are represented by raw scores of 17, 19, and 20 
points, which may not be significantly different. No 
technical data on reliability are given, a fault per- 
haps excusable in a manual intended for the use of 
teachers. But surely not excusable is the absence of 
any precautionary statements about drawing conclu- 
sions from small differences in raw scores. Validation 
is described in terms of content validity, with some 
data on the internal consistency coefficients of items. 
No evidence is given of the validity of the test for 
predicting subsequent progress in learning to read. 
In these respects, the test falls far short of meeting 
appropriate technical standards—L. F. S. 


Leiter, Russell G. Leiter Adult Intelligence Scale. 
Manual, pp. 52 ($2.00). Chicago: C. H. Stoelting 
Co., no date (1956). 

The Leiter Adult Intelligence Scale is a brief indi- 
vidual test for adults which requires about 40 min- 
utes of administration time. It seems close kin to 
the Army Individual Test of World War II, to 
which the author made considerable contributions, 
The three language subtests are similarities-differ- 
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ences, digits forward and backward, and a story 
memory test; the three nonverbal are a pathways 
test, stencil designs, and painted cubes. Four of these 
six have been previously reviewed as separate tests 
(J. consult. Psychol., 1949, 13, 386; 1950, 14, 162- 
163). The manual contains sufficient directions for 
administering and scoring, and tables of adult IQ 
norms for each subtest, the two subtotals, and the 
total score. Reliabilities, intercorrelations, sex differ- 
ences, and correlations with other tests are satisfac- 
torily reported for several groups. There is no state- 
ment about the sample used for the derivation of the 
norms. Implications from previously published stud- 
ies suggest that the IQ norms were obtained entirely 
from 256 young male veterans who had also been 
given the Stanford Binet. Since the reported studies 
of sex differences concern college women and men 
only, the extension of the norms to women, and in- 
deed to men in general, seems a bit risky —L. F. S. 


Leiter, Russell G. Leiter International Performance 
Scale. Manual, Part I, Evidences of reliability and 
validity, pp. 72 ($2.50). Chicago: C. H. Stoelting 
Co., no date (1956). 

The manua! is a well-documented collection of data 
about the Leiter International Performance Scale 
(LIPS), and draws both on hitherto unpublished 
studies by its author and published articles by the 
author and others. Most of the detailed information, 
including all of the author’s own material, relates to 
versions of the scale earlier than the current 1948 re- 
vision. Still, the development of the test has been 
continuous, and the earlier data are of relevant in- 
terest. Item analyses, frequency distributions of the 
scores of various groups, and correlations with tests 
and other meaningful criteria reveal that the LIPS 
is probably a useful psychometric instrument. It 
correlates about .70 to .80 with the Stanford Binet 
and a little higher with performance tests such as 
the Arthur Scale and the Progressive Matrices. Sur- 
prisingly lacking is any clear evidence which would 
defend the word “international” in the Scale’s title 
its degree of freedom from cultural biases. This cen- 
tral issue is not treated explicitly, but data on the 
administration of the 1936 version to various races 
in Hawaii seem to suggest that the LIPS is scarcely 
more culture-free than the Binet. Part II of the 
manual, the directions for administering the 1948 re- 
vision, had been published previously (Psychol. Serv. 
Cent. J., 1950, 2, 259-343) —L. F. S 


Seashore, Harold G., & Bennett, G. K. Seashore- 
Bennett Stenographic Proficiency Test. Manual, 
Revised 1956, pp. 8. New York: Psychological 
Corp., 1956. 


The test is a worksample, now available on phono- 
graph records and tape (see J. consult. Psychol, 
1947, 11, 341): The revised manual contains norms 
based on 1,475 applicants and 154 experienced em- 
ployees. The correlations of the test with various 
ratings of stenographic ability run from .50 to .70.— 
L. F. S. 
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Seashore Measures of Musical Talents. Manual, Re- 
vised 1956, pp. 11. New York: Psychological Corp., 
1956. 


The new manual, prepared by the Test Division 
Staff of the Psychological Corporation, replaces the 
1939 edition by C. E. Seashore and others. The 
manual describes the tests, gives instructions for ad- 
ministration and scoring, and emphasizes precautions 
in interpretation. Validity is still defended in terms 
of content, and no evidences of prediction are cited. 
Reliabilities of the parts range from .55 to .84 at 
various grade levels. Percentile norms are now based 
on substantial thousands of cases—L. F. S. 


Terman, Lewis M. Concept Mastery Test. Superior 
adults. 1 form. Untimed, (40) min. Test booklet 
($3.00 per 25) with keys, and manual, pp. 10; 
IBM answer sheet ($1.85 per 50); specimen set 
(35¢). New York: Psychological Corp., 1956. 


In 1939 and 1950, Terman constructed two forms 
of a verbal ability test designed te have sufficient 
ceiling for use in the follow-up studies of his intel- 
lectually gifted children as adults. The second ver- 
sion, called Form T, is now released for use with 


examinees such as superior graduate students for 
whom few if any other tests have sufficient diffi- 
culty. The test consists of two parts, a synonym- 
antonym subtest which seems to measure vocabulary 
level and sensitivity to perceiving the relations be- 
tween verbal concepts, and an analogies subtest 
which probes concepts drawn from a wide range ot 
informational areas. The alternate-form reliability is 
reported as .94 for students with an interval of 
one day to one week, and a remarkable .87 for the 
gifted subjects when retested after 11 to 12 years. 
Validity is explored by correlations with other tests 
(e:g., .70 with the CEEB’s SAT), and relationships 
to the original IQs of the gifted subjects and to their 
educational attainments. The test correlates 49 with 
the grade-point averages of college students, and .37 
with the four-year undergraduate record of graduate 
students. No other predictive validities are cited. The 
range of the 190-item test is indicated by the 95th 
percentile score of the gifted subjects of 177, and 
the mean score of 35 of a small group of Air Force 
personnel with less than high school education. The 
test is an interesting contribution to research on the 
highest levels of verbal ability —L. F. S. 
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